Paper: A Grammatico-Statistical Approach To Discourse Partitioning

ACL ID C94-2187
Title A Grammatico-Statistical Approach To Discourse Partitioning
Venue International Conference on Computational Linguistics
Session Main Conference
Year 1994

The paper presents a new approach to text segmen- tation --. which concerns dividing a text into coher- ent discourse units. The approach builds on tile ttle- ory of discourse segment (Nomoto and Nitta, 1993), incorporating ideas from the research on information retrieval (Salton, 1988). A discourse segment has to do with a structure of Japanese discourse; it could be thought of as a linguistic unit delnarcated by wa, a Japanese topic particle, which may extend over sev- eral sentences. The segmentation works with discourse segments and makes use of coherence measure ba~scd on tfidf, a standard information retrieval measurement (Salton, 1988; IIearst, 1993). Experi,nents have been done with a Japanese newspaper corpus. It has been found that the present approach is quite sucecssfld in reco...