Paper: Bayesian Unsupervised Topic Segmentation

ACL ID D08-1035
Title Bayesian Unsupervised Topic Segmentation
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2008
Authors

This paper describes a novel Bayesian ap- proach to unsupervised topic segmentation. Unsupervised systems for this task are driven by lexical cohesion: the tendency of well- formed segments to induce a compact and consistent lexical distribution. We show that lexical cohesion can be placed in a Bayesian context by modeling the words in each topic segment as draws from a multinomial lan- guage model associated with the segment; maximizing the observation likelihood in such a model yields a lexically-cohesive segmenta- tion. This contrasts with previous approaches, which relied on hand-crafted cohesion met- rics. The Bayesian framework provides a prin- cipled way to incorporate additional features such as cue phrases, a powerful indicator of discourse structure that has not been previ- ously...