Paper: Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations

ACL ID P12-1081
Title Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2012
Authors

Prepositions and conjunctions are two of the largest remaining bottlenecks in parsing. Across various existing parsers, these two categories have the lowest accuracies, and mistakes made have consequences for down- stream applications. Prepositions and con- junctions are often assumed to depend on lex- ical dependencies for correct resolution. As lexical statistics based on the training set only are sparse, unlabeled data can help amelio- rate this sparsity problem. By including un- labeled data features into a factorization of the problem which matches the representation of prepositions and conjunctions, we achieve a new state-of-the-art for English dependen- cies with 93.55% correct attachments on the current standard. Furthermore, conjunctions are attached with an accuracy of 90.8%, and...