Paper: Robust Machine Translation Evaluation with Entailment Features

ACL ID P09-1034
Title Robust Machine Translation Evaluation with Entailment Features
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2009

Existing evaluation metrics for machine translation lack crucial robustness: their correlations with hu- man quality judgments vary considerably across lan- guages and genres. We believe that the main reason istheirinabilitytoproperlycapture meaning:Agood translation candidate means the same thing as the reference translation, regardless of formulation. We propose a metric that evaluates MT output based on a rich set of features motivated by textual entailment, such as lexical-semantic (in-)compatibility and ar- gument structure overlap. We compare this metric against a combination metric of four state-of-the- art scores (BLEU, NIST, TER, and METEOR) in two different settings. The combination metric out- performs the individual scores, but is bested by the entailment-based metric. Combinin...