Paper: Sample Selection For Statistical Grammar Induction

ACL ID W00-1306
Title Sample Selection For Statistical Grammar Induction
Venue 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora
Session Main Conference
Year 2000
Authors

Corpus-based grz.mmar induction relies on us- ing many hand-parsed sentences as training examples. However, the construction of a training corpus with detailed syntactic analy- sis for every sentence is a labor-intensive task. We propose to use sample selection methods to minimize the amount of annotation needed in the training data, thereby reducing the workload of the human annotators. This pa- per shows that the amount of annotated train- ing data can be reduced by 36% without de- grading the quality of the induced grammars.