ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | D11-1042 |
---|---|
Title | Corroborating Text Evaluation Results with Heterogeneous Measures |
Venue | Conference on Empirical Methods in Natural Language Processing |
Session | Main Conference |
Year | 2011 |
Authors |
Automatically produced texts (e.g. transla- tions or summaries) are usually evaluated with n-gram based measures such as BLEU or ROUGE, while the wide set of more sophisti- cated measures that have been proposed in the last years remains largely ignored for practical purposes. In this paper we first present an in- depth analysis of the state of the art in order to clarify this issue. After this, we formalize and verify empirically a set of properties that every text evaluation measure based on simi- larity to human-produced references satisfies. These properties imply that corroborating sys- tem improvements with additional measures always increases the overall reliability of the evaluation process. In addition, the greater the heterogeneity of the measures (which is mea- surable) the high...