Paper: Better Evaluation Metrics Lead to Better Machine Translation

ACL ID D11-1035
Title Better Evaluation Metrics Lead to Better Machine Translation
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2011

Many machine translation evaluation met- rics have been proposed after the seminal BLEU metric, and many among them have been found to consistently outperform BLEU, demonstrated by their better correlations with human judgment. It has long been the hope that by tuning machine translation systems against these new generation metrics, ad- vances in automatic machine translation eval- uation can lead directly to advances in auto- matic machine translation. However, to date there has been no unambiguous report that these new metrics can improve a state-of-the- art machine translation system over its BLEU- tuned baseline. In this paper, we demonstrate that tuning Joshua, a hierarchical phrase-based statistical machine translation system, with the TESLA metrics results in significantly better hu...