Paper: Statistical Models For Topic Segmentation

ACL ID P99-1046
Title Statistical Models For Topic Segmentation
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1999

Most documents are about more than one subject, but many NLP and IR techniques implicitly assume documents have just one topic. We describe new clues that mark shifts to new topics, novel algorithms for identifying topic boundaries and the uses of such boundaries once identified. We report topic segmentation performance on several corpora as well as improvement on an IR task that benefits from good segmentation.