Paper: A Fast and Accurate Method for Approximate String Search

ACL ID P11-1006
Title A Fast and Accurate Method for Approximate String Search
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

This paper proposes a new method for ap- proximate string search, specifically candidate generation in spelling error correction, which is a task as follows. Given a misspelled word, the system finds words in a dictionary, which are most “similar” to the misspelled word. Thepaperproposesaprobabilisticapproachto the task, which is both accurate and efficient. The approach includes the use of a log linear model, a method for training the model, and an algorithm for finding the top k candidates. The log linear model is defined as a condi- tional probability distribution of a corrected word and a rule set for the correction con- ditioned on the misspelled word. The learn- ing method employs the criterion in candidate generation as loss function. The retrieval al- gorithm is efficient and i...