Paper: Unsupervised Multilingual Grammar Induction

ACL ID P09-1009
Title Unsupervised Multilingual Grammar Induction
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2009

We investigate the task of unsupervised constituency parsing from bilingual par- allel corpora. Our goal is to use bilin- gual cues to learn improved parsing mod- els for each language and to evaluate these models on held-out monolingual test data. WeformulateagenerativeBayesianmodel which seeks to explain the observed par- allel data through a combination of bilin- gual and monolingual parameters. To this end, we adapt a formalism known as un- ordered tree alignment to our probabilistic setting. Using this formalism, our model loosely binds parallel trees while allow- ing language-specific syntactic structure. Weperforminferenceunderthismodelus- ing Markov Chain Monte Carlo and dy- namic programming. Applying this model to three parallel corpora (Korean-English, Urdu-English, and Chinese-...