Paper: Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation

ACL ID P13-2119
Title Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation
Venue Annual Meeting of the Association of Computational Linguistics
Session Short Paper
Year 2013
Authors

Data selection is an effective approach to domain adaptation in statistical ma- chine translation. The idea is to use lan- guage models trained on small in-domain text to select similar sentences from large general-domain corpora, which are then incorporated into the training data. Sub- stantial gains have been demonstrated in previous works, which employ standard n- gram language models. Here, we explore the use of neural language models for data selection. We hypothesize that the con- tinuous vector representation of words in neural language models makes them more effective than n-grams for modeling un- known word contexts, which are prevalent in general-domain text. In a comprehen- sive evaluation of 4 language pairs (En- glish to German, French, Russian, Span- ish), we found that neura...