Paper: Correlating Human and Automatic Evaluation of a German Surface Realiser

ACL ID P09-2025
Title Correlating Human and Automatic Evaluation of a German Surface Realiser
Venue Annual Meeting of the Association of Computational Linguistics
Session Short Paper
Year 2009
Authors

We examine correlations between native speaker judgements on automatically gen- erated German text against automatic eval- uation metrics. We look at a number of metrics from the MT and Summarisation communities and find that for a relative ranking task, most automatic metrics per- form equally well and have fairly strong correlations to the human judgements. In contrast, on a naturalness judgement task, the General Text Matcher (GTM) tool correlates best overall, although in gen- eral, correlation between the human judge- ments and the automatic metrics was quite weak.