Paper: Correlation between ROUGE and Human Evaluation of Extractive Meeting Summaries

ACL ID P08-2051
Title Correlation between ROUGE and Human Evaluation of Extractive Meeting Summaries
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2008
Authors

Automatic summarization evaluation is critical to the development of summarization systems. While ROUGE has been shown to correlate well with hu- man evaluation for content match in text summa- rization, there are many characteristics in multiparty meeting domain, which may pose potential prob- lems to ROUGE. In this paper, we carefully exam- ine how well the ROUGE scores correlate with hu- man evaluation for extractive meeting summariza- tion. Our experiments show that generally the cor- relation is rather low, but a significantly better cor- relation can be obtained by accounting for several unique meeting characteristics, such as disfluencies and speaker information, especially when evaluating system-generated summaries.