Paper: Translation Model Based Cross-Lingual Language Model Adaptation: from Word Models to Phrase Models

ACL ID D12-1047
Title Translation Model Based Cross-Lingual Language Model Adaptation: from Word Models to Phrase Models
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2012
Authors

In this paper, we propose a novel translation model (TM) based cross-lingual data selec- tion model for language model (LM) adapta- tion in statistical machine translation (SMT), from word models to phrase models. Given a source sentence in the translation task, this model directly estimates the probability that a sentence in the target LM training corpus is similar. Compared with the traditional ap- proaches which utilize the first pass translation hypotheses, cross-lingual data selection mod- el avoids the problem of noisy proliferation. Furthermore, phrase TM based cross-lingual data selection model is more effective than the traditional approaches based on bag-of- words models and word-based TM, because it captures contextual information in model- ing the selection of phrase as a whole...