Source PaperYearLineSentence
P06-1129 2006 20
The method for web query spelling correction proposed by Cucerzan and Brill (2004) is essentially based on a source channel model, but it requires iterative running to derive suggestions for very-difficult-to-correct spelling errors
P06-1129 2006 6
Investigations into query log data reveal that more than 10% of queries sent to search engines contain misspelled terms (Cucerzan and Brill, 2004)
P06-1129 2006 9
Cucerzan and Brill (2004) discussed in detail specialties and difficulties of a query spell checker, and illustrated why the existing methods could not work for query spelling correction
P06-1129 2006 50
This is because query logs are not only an up-to-date term base, but also a comprehensive spelling er ror repository (Cucerzan and Brill, 2004; Ahmad and Kondrak, 2005)
D07-1019 2007 61
For example, both the work of (Cucerzan and Brill, 2004; Li et al, 2006) used n-gram statistical lan guage models trained from search engine?s query logs to estimate the query string probability
D07-1019 2007 57
The complexity of query spelling correction task requires the combination of these types of evidence, as done in (Cucerzan and Brill, 2004; Li et al, 2006)
D07-1019 2007 108
This task can leverage conventional spelling correction methods such as generating candidates based on edit distance (Cucerzan and Brill, 2004) or phonetic similarity (Philips, 1990)
D07-1019 2007 27
Cucerzan and Brill (2004) first investigated this issue and proposed to use query logs to infer cor rect spellings of misspelled terms
D09-1093 2009 57
Cucerzan and Brill (2004) claim that an LM is much more important than the channel model when correcting Web search queries
D09-1093 2009 62
There has been recent attention on using Web search query data as a source of training data, and as a target for spelling correction (Yang Zhang and Li, 2007; Cucerzan and Brill, 2004)
D09-1154 2009 67
In particular, our work is closely related to research in spelling correction for English web queries (e.g., Cucerzan and Brill, 2004; Ahmad and Kondrak, 2005; Li et al, 2006; Chen et al, 2007)
D09-1154 2009 10
Query expansion has been shown to be effective in improving web search results in English, where dif ferent methods of generating the expansion terms have been attempted, including relevance feed back (e.g., Salton and Buckley, 1990), correction of spelling errors (e.g., Cucerzan and Brill, 2004), stemming or lemmatization (e.g., Frakes, 1992), use of manually- (e.g., Aitchison and Gilchrist, 1987) or automatically- (e.g., Rasmussen 1992) constructed thesauri, and Latent Semantic Index ing (e.g., Deerwester et al 1990)
D09-1154 2009 269
As Cucerzan and Brill (2004) point out, the process of manually creating a spelling correction candidate is seriously flawed as the inten tion of the original query is completely lost: for the query gogle, it is not clear out of context if the user meant goggle, google, or gogle
N09-1022 2009 59
Cucerzan and Brill (2004) pi oneered the research of query spelling correction, with an excellent description of how a traditional dictionary-based speller had to be adapted to solve the realistic query correction problem
N09-1022 2009 5
Web search query correction is an important prob lem to solve for robust information retrieval given how pervasive errors are in search queries: it is said that more than 10% of web search queries contain errors (Cucerzan and Brill, 2004)
N09-1022 2009 63
Extending the work of Cucerzan and Brill (2004),Li et al (2006) proposed to include semantic similarity between the query and its correction candi date
N09-1022 2009 229
As Cucerzan and Brill (2004) point out, however, this method is seriously flawed in that the intention of the original query is completely lost to the annotator, without which the correction is oftenimpossible: it is not clear if gogle should be cor rected to google or goggle, or neither ? gogle may be a brand new product name
N09-1022 2009 81
Source channel models are widely used for spelling and query correction (Brill and Moore, 2000;Cucerzan and Brill, 2004)
P10-1028 2010 7
First, spelling errors are more common in search queries than in regular written text: roughly 10-15% of queries contain misspelled terms (Cucerzan and Brill, 2004)
P10-1028 2010 43
Cucerzan and Brill (2004) discuss in detail the challenges of query spelling correction, and suggest the use of query logs
C10-1041 2010 37
While almost all of the spellers mentioned above are based on a pre-defined dictionary (ei ther a lexicon against which the edit distance is computed, or a set of real-word confusion pairs), recent research on query spelling correction fo cuses on exploiting noisy Web corpora and query logs to infer knowledge about spellings and word usag in queries (Cucerzan and Brill 2004; Ahmad and Kondrak, 2005; Li et al, 2006; Whitelaw et al., 2009)
C10-1041 2010 10
Therefore, recent re search has focused on the use of Web corpora and search logs, rather than human-compiled lex icons, to infer knowledge about spellings and word usages in search queries (e.g., Whitelaw et al., 2009; Cucerzan and Brill, 2004)
D10-1122 2010 288
Firstly, we do notsuggest incorrect suggestions for valid queries unlike (Cucerzan and Brill, 2004)
D10-1122 2010 290
Secondly, we do not require query logsand other resources that are not easily available unlike (Cucerzan and Brill, 2004), (Ahmad and Kondrak, 2005)
D10-1122 2010 269
The next stream of approaches explored ways of exploiting the word?s context (Golding and Roth, 1996), (Cucerzan and Brill, 2004)
D10-1122 2010 292
Thirdly, we correct the query as a whole unlike (Ahmad and Kondrak, 2005) and can handle word order changes unlike (Cucerzan and Brill, 2004)
D10-1122 2010 271
Spelling cor rection algorithms targeted for web-search queries have been developed making use of query logs andclick-thru data (Cucerzan and Brill, 2004), (Ah mad and Kondrak, 2005), (Sun et al, 2010)
D10-1122 2010 18
As pointed out by(Cucerzan and Brill, 2004), these approaches ei ther try to correct individual words (and will fail to correct Him Clijsters to Kim Clijsters) or employ features based on relatively wide context windows 1In contrast, 80% of misspelled words in general text are due to single typographical errors as found by (Damerau, 1964)
D10-1122 2010 20
Spelling correction techniques meant for general purpose web-queries require large volumes of training data in the form of query logs for learning the error models (Cucerzan and Brill, 2004), (Ahmad and Kondrak, 2005)
P11-2085 2011 43
These featuresare usually represented by an n-gram language mod el (Cucerzan and Brill, 2004; Wilcox-O?Hearn et al., 2010)
P11-1091 2011 8
Spelling correction for search queries is important, because a significant portion of posed queries may be misspelled (Cucerzan and Brill, 2004)
P11-1091 2011 37
A source of statistics widely used in prior work is the query log (Cucerzan and Brill, 2004; Ahmad and Kondrak, 2005; Li et al, 2006a; Chen et al, 2007; Sun et al, 2010)
P11-1091 2011 201
Finally, we note that our algorithm is in the spiritof that of Cucerzan and Brill (2004), with a few in herent differences
W12-2012 2012 114
This approach borrows from the family of noisy-channel error-correction models (Zhang, et al, 2006; Cucerzan and Brill, 2004; Kernigham, et al, 1990)
P14-2028 2014 31
There have been attempts (Cucerzan and Brill,2004) to apply other rules, which would over come limitations of language and error models with compensating changes described further
P14-2028 2014 34
It is motivated by the assump tion, that we are more likely to successfully correct the query if we take several short steps instead of one big step (Cucerzan and Brill, 2004) . Iterative correction is hill climbing in the space of possible corrections: on each iteration we make a transition to the best point in the neighbourhood,i.e. to correction, that has maximal posterior probability P (c|q)
P14-2028 2014 11
As a result, either the misspelled word it self, or the other (less complicated, more frequent) misspelling of the same word wins the likelihood race.To compensate for this defect of the noisy chan nel, the iterative approach (Cucerzan and Brill, 2004) is typically used
P14-2028 2014 83
datasetWe were not able to reproduce superior performance of the iterative method over the noisy channel, reported by (Cucerzan and Brill, 2004)