Paper: Domain Adaptation for Machine Translation by Mining Unseen Words

ACL ID P11-2071
Title Domain Adaptation for Machine Translation by Mining Unseen Words
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

We show that unseen words account for a large part of the translation error when mov- ing to new domains. Using an extension of a recent approach to mining translations from comparable corpora (Haghighi et al., 2008), we are able to find translations for otherwise OOV terms. We show several approaches to integrating such translations into a phrase- based translation system, yielding consistent improvements in translations quality (between 0.5 and 1.5 Bleu points) on four domains and two language pairs.