Paper: Adding More Languages Improves Unsupervised Multilingual Part-of-Speech Tagging: a Bayesian Non-Parametric Approach

ACL ID N09-1010
Title Adding More Languages Improves Unsupervised Multilingual Part-of-Speech Tagging: a Bayesian Non-Parametric Approach
Venue Human Language Technologies
Session Main Conference
Year 2009
Authors

We investigate the problem of unsupervised part-of-speech tagging when raw parallel data is available in a large number of languages. Patterns of ambiguity vary greatly across lan- guages and therefore even unannotated multi- lingual data can serve as a learning signal. We propose a non-parametric Bayesian model that connects related tagging decisions across lan- guages through the use of multilingual latent variables. Our experiments show that perfor- mance improves steadily as the number of lan- guages increases.