Paper: Categorizing Unknown Words: Using Decision Trees To Identify Names And Misspellings

ACL ID A00-1024
Title Categorizing Unknown Words: Using Decision Trees To Identify Names And Misspellings
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2000
Authors

This paper introduces a system for categorizing un- known words. The system is based on a multi- component architecture where each component is re- sponsible for identifying one class of unknown words. The focus of this paper is the components that iden- tify names and spelling errors. Each component uses a decision tree architecture to combine multiple types of evidence about the unknown word. The sys- tem is evaluated using data from live closed captions - a genre replete with a wide variety of unknown words.