Paper: Minimizing Manual Annotation Cost In Supervised Training From Corpora

ACL ID P96-1042
Title Minimizing Manual Annotation Cost In Supervised Training From Corpora
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1996
Authors

Corpus-based methods for natural lan- guage processing often use supervised training, requiring expensive manual an- notation of training corpora. This paper investigates methods for reducing annota- tion cost by sample selection. In this ap- proach, during training the learning pro- gram examines many unlabeled examples and selects for labeling (annotation) only those that are most informative at each stage. This avoids redundantly annotating examples that contribute little new infor- mation. This paper extends our previous work on committee-based sample selection for probabilistic classifiers. We describe a family of methods for committee-based sample selection, and report experimental results for the task of stochastic part-of- speech tagging. We find that all variants achieve a signifi...