Paper: Segment Choice Models: Feature-Rich Models For Global Distortion In Statistical Machine Translation

ACL ID N06-1004
Title Segment Choice Models: Feature-Rich Models For Global Distortion In Statistical Machine Translation
Venue Human Language Technologies
Session Main Conference
Year 2006
Authors

This paper presents a new approach to distortion (phrase reordering) in phrase- based machine translation (MT). Distor- tion is modeled as a sequence of choices during translation. The approach yields trainable, probabilistic distortion models that are global: they assign a probability to each possible phrase reordering. These “segment choice” models (SCMs) can be trained on “segment-aligned” sentence pairs; they can be applied during decoding or rescoring. The approach yields a metric called “distortion perplexity” (“disperp”) for comparing SCMs offline on test data, analogous to perplexity for language models. A decision-tree-based SCM is tested on Chinese-to-English translation, and outperforms a baseline distortion penalty approach at the 99% confidence level.