Paper: Uptraining for Accurate Deterministic Question Parsing

ACL ID D10-1069
Title Uptraining for Accurate Deterministic Question Parsing
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2010

It is well known that parsing accuracies drop significantly on out-of-domain data. What is less known is that some parsers suffer more from domain shifts than others. We show that dependency parsers have more difficulty parsing questions than constituency parsers. In particular, deterministic shift-reduce depen- dency parsers, which are of highest interest for practical applications because of their lin- ear running time, drop to 60% labeled accu- racy on a question test set. We propose an uptraining procedure in which a deterministic parser is trained on the output of a more ac- curate, but slower, latent variable constituency parser (converted to dependencies). Uptrain- ing with 100K unlabeled questions achieves results comparable to having 2K labeled ques- tions for training. With 100K ...