Paper: Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases

ACL ID D09-1040
Title Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2009
Authors

Untranslated words still constitute a ma- jor problem for Statistical Machine Trans- lation (SMT), and current SMT systems are limited by the quantity of parallel training texts. Augmenting the training data with paraphrases generated by pivot- ing through other languages alleviates this problem, especially for the so-called “low density” languages. But pivoting requires additional parallel texts. We address this problem by deriving paraphrases monolin- gually, using distributional semantic simi- larity measures, thus providing access to larger training resources, such as compa- rable and unrelated monolingual corpora. We present what is to our knowledge the first successful integration of a colloca- tional approach to untranslated words with an end-to-end, state of the art SMT sys- te...