Paper: Topic Segmentation with Hybrid Document Indexing

ACL ID D07-1037
Title Topic Segmentation with Hybrid Document Indexing
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2007

We present a domain-independent unsuper- vised topic segmentation approach based on hybrid document indexing. Lexical chains have been successfully employed to evalu- ate lexical cohesion of text segments and to predict topic boundaries. Our approach is based in the notion of semantic cohesion. It uses spectral embedding to estimate seman- tic association between content nouns over a span of multiple text segments. Our method significantly outperforms the baseline on the topic segmentation task and achieves perfor- mance comparable to state-of-the-art meth- ods that incorporate domain specific infor- mation.