Paper: Segmenting Email Message Text into Zones

ACL ID D09-1096
Title Segmenting Email Message Text into Zones
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2009

In the early days of email, widely-used conventions for indicating quoted reply content and email signatures made it easy to segment email messages into their func- tional parts. Today, the explosion of dif- ferent email formats and styles, coupled with the ad hoc ways in which people vary the structure and layout of their messages, means that simple techniques for identify- ing quoted replies that used to yield 95% accuracy now find less than 10% of such content. In this paper, we describe Zebra, an SVM-based system for segmenting the body text of email messages into nine zone types based on graphic, orthographic and lexical cues. Zebra performs this task with an accuracy of 87.01%; when the num- ber of zones is abstracted to two or three zone classes, this increases to 93.60% and 91.53% ...