Paper: Unsupervised Discriminative Language Model Training for Machine Translation using Simulated Confusion Sets

ACL ID C10-2075
Title Unsupervised Discriminative Language Model Training for Machine Translation using Simulated Confusion Sets
Venue International Conference on Computational Linguistics
Session Poster Session
Year 2010
Authors

An unsupervised discriminative training procedure is proposed for estimating a language model (LM) for machine trans- lation (MT). An English-to-English syn- chronous context-free grammar is derived from a baseline MT system to capture translation alternatives: pairs of words, phrases or other sentence fragments that potentially compete to be the translation of the same source-language fragment. Using this grammar, a set of impostor sentences is then created for each En- glish sentence to simulate confusions that would arise if the system were to process an (unavailable) input whose correct En- glish translation is that sentence. An LM is then trained to discriminate between the original sentences and the impostors. The procedure is applied to the IWSLT Chinese-to-English translation task,...