Paper: Spoken And Written News Story Segmentation Using Lexical Chains

ACL ID N03-3009
Title Spoken And Written News Story Segmentation Using Lexical Chains
Venue Human Language Technologies
Session Student Session
Year 2003
Authors

In this paper we describe a novel approach to lexical chain based segmentation of broadcast news stories. Our segmentation system SeLeCT is evaluated with respect to two other lexical cohesion based segmenters TextTiling and C99. Using the Pk and WindowDiff evaluation metrics we show that SeLeCT outperforms both systems on spoken news transcripts (CNN) while the C99 algorithm performs best on the written newswire collection (Reuters). We also examine the differences between spoken and written news styles and how these differences can affect segmentation accuracy.