Paper: Word Distributions For Thematic Segmentation In A Support Vector Machine Approach

ACL ID W06-2914
Title Word Distributions For Thematic Segmentation In A Support Vector Machine Approach
Venue International Conference on Computational Natural Language Learning
Session Main Conference
Year 2006
Authors

We investigate the appropriateness of us- ing a technique based on support vector machines for identifying thematic struc- ture of text streams. The thematic seg- mentation task is modeled as a binary- classification problem, where the different classes correspond to the presence or the absence of a thematic boundary. Exper- iments are conducted with this approach by using features based on word distri- butions through text. We provide em- pirical evidence that our approach is ro- bust, by showing good performance on three different data sets. In particu- lar, substantial improvement is obtained over previously published results of word- distribution based systems when evalua- tion is done on a corpus of recorded and transcribed multi-party dialogs.