Paper: The Best Lexical Metric for Phrase-Based Statistical MT System Optimization

ACL ID N10-1080
Title The Best Lexical Metric for Phrase-Based Statistical MT System Optimization
Venue Human Language Technologies
Session Main Conference
Year 2010
Authors

Translation systems are generally trained to optimize BLEU, but many alternative metrics are available. We explore how optimizing toward various automatic evaluation metrics (BLEU,METEOR,NIST,TER)affectsthere- sulting model. We train a state-of-the-art MT system using MERT on many parameteriza- tions of each metric and evaluate the result- ing models on the other metrics and also us- ing human judges. In accordance with popular wisdom, we find that it’s important to train on the same metric used in testing. However, we also find that training to a newer metric is only useful to the extent that the MT model’s struc- ture and features allow it to take advantage of the metric. Contrasting with TER’s good cor- relation with human judgments, we show that people tend to prefer BLEU and NIS...