Paper: Latent Domain Phrase-based Models for Adaptation

ACL ID D14-1062
Title Latent Domain Phrase-based Models for Adaptation
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2014

Phrase-based models directly trained on mix-of-domain corpora can be sub-optimal. In this paper we equip phrase-based models with a latent domain variable and present a novel method for adapting them to an in-domain task rep- resented by a seed corpus. We derive an EM algorithm which alternates between inducing domain-focused phrase pair estimates, and weights for mix-domain sentence pairs reflecting their relevance for the in-domain task. By embedding our latent domain phrase model in a sentence-level model and training the two in tandem, we are able to adapt all core translation components together ? phrase, lexical and reordering. We show experiments on weighing sentence pairs for relevance as well as adapting phrase-based models, showing significant performance improvement in both task...