Paper: Sample Selection for Statistical Parsers: Cognitively Driven Algorithms and Evaluation Measures

ACL ID W09-1103
Title Sample Selection for Statistical Parsers: Cognitively Driven Algorithms and Evaluation Measures
Venue International Conference on Computational Natural Language Learning
Session Main Conference
Year 2009
Authors

Creating large amounts of manually annotated training data for statistical parsers imposes heavy cognitive load on the human annota- tor and is thus costly and error prone. It is hence of high importance to decrease the human efforts involved in creating training data without harming parser performance. For constituency parsers, these efforts are tradi- tionally evaluated using the total number of constituents (TC) measure, assuming uniform cost for each annotated item. In this paper, we introduce novel measures that quantify aspects of the cognitive efforts of the human annota- tor that are not reflected by the TC measure, and show that they are well established in the psycholinguistic literature. We present a novel parameter based sample selection approach for creating good samples in te...