Paper: Using Linguistically Motivated Features For Paragraph Boundary Identification

ACL ID W06-1632
Title Using Linguistically Motivated Features For Paragraph Boundary Identification
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2006
Authors

In this paper we propose a machine- learning approach to paragraph boundary identification which utilizes linguistically motivated features. We investigate the re- lation between paragraph boundaries and discourse cues, pronominalization and in- formation structure. We test our algorithm on German data and report improvements over three baselines including a reimple- mentation of Sporleder & Lapata’s (2006) work on paragraph segmentation. An analysis of the features’ contribution sug- gests an interpretation of what paragraph boundaries indicate and what they depend on.