Paper: Self-Training for Enhancement and Domain Adaptation of Statistical Parsers Trained on Small Datasets

ACL ID P07-1078
Title Self-Training for Enhancement and Domain Adaptation of Statistical Parsers Trained on Small Datasets
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2007
Authors

Creating large amounts of annotated data to train statistical PCFG parsers is expensive, and the performance of such parsers declines when training and test data are taken from different domains. In this paper we use self- training in order to improve the quality of a parser and to adapt it to a different do- main, using only small amounts of manually annotated seed data. We report significant improvement both when the seed and test data are in the same domain and in the out- of-domain adaptation scenario. In particu- lar, we achieve 50% reduction in annotation cost for the in-domain case, yielding an im- provement of 66% over previous work, and a 20-33% reduction for the domain adaptation case. This is the first time that self-training with small labeled datasets is applied suc- cessfully...