Paper: Text Segmentation with Multiple Surface Linguistic Cues

ACL ID C98-2140
Title Text Segmentation with Multiple Surface Linguistic Cues
Venue International Conference on Computational Linguistics
Session Main Conference
Year 1998
Authors

In general, a certain range of sentences in a text, is widely assumed to form a coherent unit which is called a discourse segment. Identifying the segment boundaries is a first step to recognize the structure of a text. In this paper, we describe a method for iden- tifying segment boundaries of a Japanese text with the aid of multiple surface linguistic cues, though our experiments might be small-scale. We also present a method of training the weights for multiple linguistic cues automatically without the overfitting problem.