Paper: Quantifying the Limits and Success of Extractive Summarization Systems Across Domains

ACL ID N10-1133
Title Quantifying the Limits and Success of Extractive Summarization Systems Across Domains
Venue Human Language Technologies
Session Main Conference
Year 2010
Authors

This paper analyzes the topic identification stage of single-document automatic text sum- marization across four different domains, con- sisting of newswire, literary, scientific and le- gal documents. We present a study that ex- plores the summary space of each domain via an exhaustive search strategy, and finds the probability density function (pdf) of the ROUGE score distributions for each domain. We then use this pdf to calculate the per- centile rank of extractive summarization sys- tems. Our results introduce a new way to judge the success of automatic summarization systems and bring quantified explanations to questions such as why it was so hard for the systems to date to have a statistically signifi- cant improvement over the lead baseline in the news domain.