Paper: Improved Statistical Machine Translation Using Paraphrases

ACL ID N06-1003
Title Improved Statistical Machine Translation Using Paraphrases
Venue Human Language Technologies
Session Main Conference
Year 2006

Parallel corpora are crucial for training SMT systems. However, for many lan- guage pairs they are available only in very limited quantities. For these lan- guage pairs a huge portion of phrases en- countered at run-time will be unknown. We show how techniques from paraphras- ing can be used to deal with these oth- erwise unknown source language phrases. Our results show that augmenting a state- of-the-art SMT system with paraphrases leads to significantly improved coverage and translation quality. For a training corpus with 10,000 sentence pairs we in- crease the coverage of unique test set un- igrams from 48% to 90%, with more than half of the newly covered items accurately translated, as opposed to none in current approaches.