Paper: On the Role of Lexical Features in Sequence Labeling

ACL ID D09-1119
Title On the Role of Lexical Features in Sequence Labeling
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2009

We use the technique of SVM anchoring to demonstrate that lexical features extracted from a training corpus are not necessary to obtain state of the art results on tasks such as Named Entity Recognition and Chunk- ing. While standard models require as many as 100K distinct features, we derive models with as little as 1K features that perform as well or better on different do- mains. These robust reduced models in- dicate that the way rare lexical features contribute to classification in NLP is not fully understood. Contrastive error analy- sis (with and without lexical features) in- dicates that lexical features do contribute to resolving some semantic and complex syntactic ambiguities – but we find this contribution does not generalize outside the training corpus. As a general strat- eg...