Paper: Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules

ACL ID P11-2051
Title Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

We present an approach of expanding paral- lel corpora for machine translation. By uti- lizing Semantic role labeling (SRL) on one side of the language pair, we extract SRL sub- stitution rules from existing parallel corpus. The rules are then used for generating new sentence pairs. An SVM classifier is built to filter the generated sentence pairs. The fil- tered corpus is used for training phrase-based translation models, which can be used directly in translation tasks or combined with base- line models. Experimental results on Chinese- English machine translation tasks show an av- erage improvement of 0.45 BLEU and 1.22 TER points across 5 different NIST test sets.