Paper: Text Segmentation with LDA-Based Fisher Kernel

ACL ID P08-2068
Title Text Segmentation with LDA-Based Fisher Kernel
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2008

In this paper we propose a domain- independent text segmentation method, which consists of three components. Latent Dirichlet allocation (LDA) is employed to compute words semantic distribution, and we measure semantic similarity by the Fisher kernel. Finally global best segmentation is achieved by dynamic programming. Experi- ments on Chinese data sets with the technique show it can be effective. Introducing latent semantic information, our algorithm is robust on irregular-sized segments.