Paper: Reranking And Self-Training For Parser Adaptation

ACL ID P06-1043
Title Reranking And Self-Training For Parser Adaptation
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2006

Statistical parsers trained and tested on the Penn Wall Street Journal (WSJ) treebank have shown vast improvements over the last 10 years. Much of this improvement, however, is based upon an ever-increasing number of features to be trained on (typi- cally) the WSJ treebank data. This has led to concern that such parsers may be too finely tuned to this corpus at the expense of portability to other genres. Such wor- ries have merit. The standard “Charniak parser” checks in at a labeled precision- recall f-measure of 89.7% on the Penn WSJ test set, but only 82.9% on the test set from the Brown treebank corpus. This paper should allay these fears. In par- ticular, we show that the reranking parser described in Charniak and Johnson (2005) improves performance of the parser on Brown to 85.2%...