Paper: Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails

ACL ID D10-1038
Title Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2010
Authors

This work concerns automatic topic segmen- tation of email conversations. We present a corpus of email threads manually annotated with topics, and evaluate annotator reliabil- ity. To our knowledge, this is the first such email corpus. We show how the existing topic segmentation models (i.e., Lexical Chain Seg- menter (LCSeg) and Latent Dirichlet Alloca- tion (LDA)) which are solely based on lex- ical information, can be applied to emails. By pointing out where these methods fail and what any desired model should consider, we propose two novel extensions of the models that not only use lexical information but also exploit finer level conversation structure in a principled way. Empirical evaluation shows that LCSeg is a better model than LDA for segmenting an email thread into topical clus-...