Paper: Fast and Accurate Misspelling Correction in Large Corpora

ACL ID D14-1171
Title Fast and Accurate Misspelling Correction in Large Corpora
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2014
Authors

There are several NLP systems whose ac- curacy depends crucially on finding mis- spellings fast. However, the classical ap- proach is based on a quadratic time algo- rithm with 80% coverage. We present a novel algorithm for misspelling detection, which runs in constant time and improves the coverage to more than 96%. We use this algorithm together with a cross docu- ment coreference system in order to find proper name misspellings. The experi- ments confirmed significant improvement over the state of the art.