Paper: Learning Information Structure In The Prague Treebank

ACL ID P05-2020
Title Learning Information Structure In The Prague Treebank
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2005

This paper investigates the automatic identification of aspects of Information Structure (IS) in texts. The experiments use the Prague Dependency Treebank which is annotated with IS following the Praguian approach of Topic Focus Artic- ulation. We automatically detect t(opic) and f(ocus), using node attributes from the treebank as basic features and derived features inspired by the annotation guide- lines. We show the performance of C4.5, Bagging, and Ripper classifiers on sev- eral classes of instances such as nouns and pronouns, only nouns, only pronouns. A baseline system assigning always f(ocus) has an F-score of 42.5%.