Paper: Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data

ACL ID P14-1064
Title Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

Statistical phrase-based translation learns translation rules from bilingual corpora, and has traditionally only used monolin- gual evidence to construct features that rescore existing translation candidates. In this work, we present a semi-supervised graph-based approach for generating new translation rules that leverages bilingual and monolingual data. The proposed tech- nique first constructs phrase graphs using both source and target language mono- lingual corpora. Next, graph propaga- tion identifies translations of phrases that were not observed in the bilingual cor- pus, assuming that similar phrases have similar translations. We report results on a large Arabic-English system and a medium-sized Urdu-English system. Our proposed approach significantly improves the performance of com...