Paper: Multi-Paragraph Segmentation Of Expository Text

ACL ID P94-1002
Title Multi-Paragraph Segmentation Of Expository Text
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1994
  • Marti A. Hearst (University of California at Berkeley, Berkeley CA; Palo Alto Research Center, Palo Alto CA)

This paper describes TextTiling, an algorithm for parti- tioning expository texts into coherent multi-paragraph discourse units which reflect the subtopic structure of the texts. The algorithm uses domain-independent lex- ical frequency and distribution information to recog- nize the interactions of multiple simultaneous themes. Two fully-implemented versions of the algorithm are de- scribed and shown to produce segmentation that corre- sponds well to human judgments of the major subtopic boundaries of thirteen lengthy texts.