Paper: Sample Selection For Statistical Grammar Induction

ACL ID W00-1306
Title Sample Selection For Statistical Grammar Induction
Venue 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora
Session Main Conference
Year 2000

Corpus-based grz.mmar induction relies on us- ing many hand-parsed sentences as training examples. However, the construction of a training corpus with detailed syntactic analy- sis for every sentence is a labor-intensive task. We propose to use sample selection methods to minimize the amount of annotation needed in the training data, thereby reducing the workload of the human annotators. This pa- per shows that the amount of annotated train- ing data can be reduced by 36% without de- grading the quality of the induced grammars.