Paper: Learning Common Grammar from Multilingual Corpus

ACL ID P10-2034
Title Learning Common Grammar from Multilingual Corpus
Venue Annual Meeting of the Association of Computational Linguistics
Session Short Paper
Year 2010

We propose a corpus-based probabilis- tic framework to extract hidden common syntax across languages from non-parallel multilingual corpora in an unsupervised fashion. For this purpose, we assume a generative model for multilingual corpora, where each sentence is generated from a language dependent probabilistic context- free grammar (PCFG), and these PCFGs are generated from a prior grammar that is common across languages. We also de- velop a variational method for efficient in- ference. Experiments on a non-parallel multilingual corpus of eleven languages demonstrate the feasibility of the proposed method.