Paper: Class-Based Language Modeling for Translating into Morphologically Rich Languages

ACL ID C14-1181
Title Class-Based Language Modeling for Translating into Morphologically Rich Languages
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2014
Authors

Class-based language modeling (LM) is a long-studied and effective approach to overcome data sparsity in the context of n-gram model training. In statistical machine translation (SMT), differ- ent forms of class-based LMs have been shown to improve baseline translation quality when used in combination with standard word-level LMs but no published work has systematically com- pared different kinds of classes, model forms and LM combination methods in a unified SMT setting. This paper aims to fill these gaps by focusing on the challenging problem of translating into Russian, a language with rich inflectional morphology and complex agreement phenomena. We conduct our evaluation in a large-data scenario and report statistically significant BLEU im- provements of up to 0.6 points when using a r...