Paper: Discovery of Topically Coherent Sentences for Extractive Summarization

ACL ID P11-1050
Title Discovery of Topically Coherent Sentences for Extractive Summarization
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

Extractive methods for multi-document sum- marization are mainly governed by informa- tion overlap, coherence, and content con- straints. We present an unsupervised proba- bilistic approach to model the hidden abstract concepts across documents as well as the cor- relation between these concepts, to generate topically coherent and non-redundant sum- maries. Based on human evaluations our mod- els generate summaries with higher linguistic quality in terms of coherence, readability, and redundancy compared to benchmark systems. Although our system is unsupervised and opti- mized for topical coherence, we achieve a 44.1 ROUGE on the DUC-07 test set, roughly in the range of state-of-the-art supervised models.