Paper: Unsupervised Multilingual Learning for POS Tagging

ACL ID D08-1109
Title Unsupervised Multilingual Learning for POS Tagging
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2008

We demonstrate the effectiveness of multilin- gual learning for unsupervised part-of-speech tagging. The key hypothesis of multilin- gual learning is that by combining cues from multiple languages, the structure of each be- comes more apparent. We formulate a hier- archical Bayesian model for jointly predicting bilingual streams of part-of-speech tags. The model learns language-specific features while capturing cross-lingual patterns in tag distri- bution for aligned words. Once the parame- ters of our model have been learned on bilin- gual parallel data, we evaluate its performance on a held-out monolingual test set. Our evalu- ation on six pairs of languages shows consis- tent and significant performance gains over a state-of-the-art monolingual baseline. For one language pair, we observ...