Paper: Assessing the Effect of Inconsistent Assessors on Summarization Evaluation

ACL ID P12-2070
Title Assessing the Effect of Inconsistent Assessors on Summarization Evaluation
Venue Annual Meeting of the Association of Computational Linguistics
Session Short Paper
Year 2012
Authors

We investigate the consistency of human as- sessors involved in summarization evaluation to understand its effect on system ranking and automatic evaluation techniques. Using Text Analysis Conference data, we measure anno- tator consistency based on human scoring of summaries for Responsiveness, Readability, and Pyramid scoring. We identify inconsis- tencies in the data and measure to what ex- tent these inconsistencies affect the ranking of automatic summarization systems. Finally, we examine the stability of automatic metrics (ROUGE and CLASSY) with respect to the inconsistent assessments.