Paper: Two Decades of Unsupervised POS Induction: How Far Have We Come?

ACL ID D10-1056
Title Two Decades of Unsupervised POS Induction: How Far Have We Come?
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2010
Authors

Part-of-speech (POS) induction is one of the most popular tasks in research on unsuper- vised NLP. Many different methods have been proposed, yet comparisons are difficult to make since there is little consensus on eval- uation framework, and many papers evalu- ate against only one or two competitor sys- tems. Here we evaluate seven different POS induction systems spanning nearly 20 years of work, using a variety of measures. We show that some of the oldest (and simplest) systems stand up surprisingly well against more recent approaches. Since most of these systems were developed and tested using data from the WSJ corpus, we compare their generalization abil- ities by testing on both WSJ and the multi- lingual Multext-East corpus. Finally, we in- troduce the idea of evaluating systems base...