Paper: A Bayesian Mixture Model for PoS Induction Using Multiple Features

ACL ID D11-1059
Title A Bayesian Mixture Model for PoS Induction Using Multiple Features
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2011
Authors

In this paper we present a fully unsupervised syntactic class induction system formulated as a Bayesian multinomial mixture model, where each word type is constrained to belong to a single class. By using a mixture model rather than a sequence model (e.g., HMM), we are able to easily add multiple kinds of features, including those at both the type level (mor- phology features) and token level (context and alignmentfeatures,thelatterfromparallelcor- pora). Using only context features, our sys- tem yields results comparable to state-of-the art, far better than a similar model without the one-class-per-type constraint. Using the addi- tional features provides added benefit, and our final system outperforms the best published results on most of the 25 corpora tested.