Paper: A Hierarchical Dirichlet Process Model for Joint Part-of-Speech and Morphology Induction

ACL ID N12-1045
Title A Hierarchical Dirichlet Process Model for Joint Part-of-Speech and Morphology Induction
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2012
Authors

In this paper we present a fully unsupervised nonparametric Bayesian model that jointly in- duces POS tags and morphological segmen- tations. The model is essentially an infi- nite HMM that infers the number of states from data. Incorporating segmentation into the same model provides the morphological features to the system and eliminates the need to find them during preprocessing step. We show that learning both tasks jointly actually leads to better results than learning either task with gold standard data from the other task provided. The evaluation on multilingual data shows that the model produces state-of-the-art results on POS induction.