Paper: Self-Training for Biomedical Parsing

ACL ID P08-2026
Title Self-Training for Biomedical Parsing
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2008

Parser self-training is the technique of taking an existing parser, parsing extra data and then creating a second parser by treating the extra data as further training data. Here we apply this tech- nique to parser adaptation. In partic- ular, we self-train the standard Char- niak/Johnson Penn-Treebank parser us- ing unlabeled biomedical abstracts. This achieves an f-score of 84.3% on a stan- dard test set of biomedical abstracts from the Genia corpus. This is a 20% error re- duction over the best previous result on biomedical data (80.2% on the same test set).