Paper: An Ensemble Method for Selection of High Quality Parses

ACL ID P07-1052
Title An Ensemble Method for Selection of High Quality Parses
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2007

While the average performance of statisti- cal parsers gradually improves, they still at- tach to many sentences annotations of rather low quality. The number of such sentences grows when the training and test data are taken from different domains, which is the case for major web applications such as in- formation retrieval and question answering. In this paper we present a Sample Ensem- ble Parse Assessment (SEPA) algorithm for detecting parse quality. We use a function of the agreement among several copies of a parser, each of which trained on a differ- ent sample from the training data, to assess parse quality. We experimented with both generative and reranking parsers (Collins, Charniak and Johnson respectively). We show superior results over several baselines, both when the training a...