Paper: Cohesion And Collocation: Using Context Vectors In Text Segmentation

ACL ID P99-1077
Title Cohesion And Collocation: Using Context Vectors In Text Segmentation
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1999
Authors

Collocational word similarity is considered a source of text cohesion that is hard to measure and quan- tify. The work presented here explores the use of in- formation from a training corpus in measuring word similarity and evaluates the method in the text seg- mentation task. An implementation, the VecTile system, produces similarity curves over texts using pre-compiled vector representations of the contex- tual behavior of words. The performance of this system is shown to improve over that of the purely string-based TextTiling algorithm (Hearst, 1997). 1 Background The notion of text cohesion rests on the intuition that a text is "held together" by a variety of inter- nal forces. Much of the relevant linguistic literature is indebted to Halliday and Hasan (1976), where co- hesion is defi...