Paper: Accurate Unlexicalized Parsing

ACL ID P03-1054
Title Accurate Unlexicalized Parsing
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2003
Authors

We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence assumptions latent in a vanilla treebank grammar. Indeed, its performance of 86.36% (LP/LR F1) is better than that of early lexicalized PCFG models, and surprisingly close to the current state-of-the- art. This result has potential uses beyond establish- ing a strong lower bound on the maximum possi- ble accuracy of unlexicalized models: an unlexical- ized PCFG is much more compact, easier to repli- cate, and easier to interpret than more complex lex- ical models, and the parsing algorithms are simpler, more widely understood, of lower asymptotic com- plexity, and easier to optimize. In the early ...