Paper: A Large Scale Distributed Syntactic Semantic and Lexical Language Model for Machine Translation

ACL ID P11-1021
Title A Large Scale Distributed Syntactic Semantic and Lexical Language Model for Machine Translation
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

This paper presents an attempt at building a large scale distributed composite language model that simultaneously accounts for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content under a directed Markov ran- dom field paradigm. The composite language model has been trained by performing a con- vergent N-best list approximate EM algorithm that has linear time complexity and a follow- up EM algorithm to improve word prediction power on corpora with up to a billion tokens and stored on a supercomputer. The large scale distributed composite language model gives drastic perplexity reduction over n- grams and achieves significantly better trans- lation quality measured by the BLEU score and “readability” when applied to the task of ...