Paper: Effective Use of Linguistic and Contextual Information for Statistical Machine Translation

ACL ID D09-1008
Title Effective Use of Linguistic and Contextual Information for Statistical Machine Translation
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2009
Authors

Current methods of using lexical features in machine translation have difficulty in scaling up to realistic MT tasks due to a prohibitively large number of parame- ters involved. In this paper, we propose methods of using new linguistic and con- textual features that do not suffer from this problem and apply them in a state-of- the-art hierarchical MT system. The fea- tures used in this work are non-terminal labels, non-terminal length distribution, source string context and source depen- dency LM scores. The effectiveness of our techniques is demonstrated by signif- icant improvements over a strong base- line. On Arabic-to-English translation, improvements in lower-cased BLEU are 2.0 on NIST MT06 and 1.7 on MT08 newswire data on decoding output. On Chinese-to-English translation, the im- ...