Paper: Text Segmentation Using Exponential Models

ACL ID W97-0304
Title Text Segmentation Using Exponential Models
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 1997

This paper introduces a new statistical ap- proach to partitioning text automatically into coherent segments. Our approach en- lists both short-range and long-range lan- guage models to help it sniff out likely sites of topic changes in text. To aid its search, the system consults a set of simple lexical hints it has learned to associate with the presence of boundaries through inspection of a large corpus of annotated data. We also propose a new probabilistically mo- tivated error metric for use by the natu- ral language processing and information re- trieval communities, intended to supersede precision and recall for appraising segmen- tation algorithms. Qualitative assessment of our algorithm as well as evaluation using this new metric demonstrate the effective- ness of our approach in t...