Paper: Word-Sense Disambiguation Using Statistical Models Of Roget's Categories Trained On Large Corpora

ACL ID C92-2070
Title Word-Sense Disambiguation Using Statistical Models Of Roget's Categories Trained On Large Corpora
Venue International Conference on Computational Linguistics
Session Main Conference
Year 1992
Authors

This paper describes a program that disambignates English word senses in unrestricted text using statistical models of the major Roget's Thesaurus categories. Roget's categories serve as approximations of conceptual classes. The categories listed for a word in Roger's index tend to correspond to sense distinctions; thus selecting the most likely category provides a useful level of sense disambiguatiou. The selection of categories is accomplished by identifying and weighting words that are indicative of each category when seen in context, using a Bayesian theoretical framework. Other statistical approaches have required special corpora or hand-labeled training examples for much of the lexicon. Our use of class models overcomes this knowledge acquisition bottleneck, enabling training on unre...