Paper: Fast High-Accuracy Part-of-Speech Tagging by Independent Classifiers

ACL ID C14-1110
Title Fast High-Accuracy Part-of-Speech Tagging by Independent Classifiers
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2014
Authors

Part-of-speech (POS) taggers can be quite accurate, but for practical use, accuracy often has to be sacrificed for speed. For example, the maintainers of the Stanford tagger (Toutanova et al., 2003; Manning, 2011) recommend tagging with a model whose per tag error rate is 17% higher, relatively, than their most accurate model, to gain a factor of 10 or more in speed. In this paper, we treat POS tagging as a single-token independent multiclass classification task. We show that by using a rich feature set we can obtain high tagging accuracy within this framework, and by employing some novel feature-weight-combination and hypothesis-pruning techniques we can also get very fast tagging with this model. A prototype tagger implemented in Perl is tested and found to be at least 8 times faster tha...