ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | N12-1045 |
---|---|
Title | A Hierarchical Dirichlet Process Model for Joint Part-of-Speech and Morphology Induction |
Venue | Annual Conference of the North American Chapter of the Association for Computational Linguistics |
Session | Main Conference |
Year | 2012 |
Authors |
In this paper we present a fully unsupervised nonparametric Bayesian model that jointly in- duces POS tags and morphological segmen- tations. The model is essentially an infi- nite HMM that infers the number of states from data. Incorporating segmentation into the same model provides the morphological features to the system and eliminates the need to find them during preprocessing step. We show that learning both tasks jointly actually leads to better results than learning either task with gold standard data from the other task provided. The evaluation on multilingual data shows that the model produces state-of-the-art results on POS induction.