Paper: Enriching The Knowledge Sources Used In A Maximum Entropy Part-Of-Speech Tagger

ACL ID W00-1308
Title Enriching The Knowledge Sources Used In A Maximum Entropy Part-Of-Speech Tagger
Venue 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora
Session Main Conference
Year 2000
Authors

This paper presents results for a maximum- entropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitaliza- tion for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.