Paper: Improving Statistical MT Through Morphological Analysis

ACL ID H05-1085
Title Improving Statistical MT Through Morphological Analysis
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2005

In statistical machine translation, estimat- ing word-to-word alignment probabilities for the translation model can be difficult due to the problem of sparse data: most words in a given corpus occur at most a handful of times. With a highly inflected language such as Czech, this problem can be particularly severe. In addition, much ofthe morphologicalvariationseeninCzech words is not reflected in either the morphol- ogy or syntax of a language like English. In this work, we show that using morphologi- cal analysis to modify the Czech input can improve a Czech-English machine transla- tion system. We investigate several differ- ent methods of incorporatingmorphological information, and show that a system that combines these methods yields the best re- sults. Our final system achieves a BLEU...