Paper: Is the End of Supervised Parsing in Sight?

ACL ID P07-1051
Title Is the End of Supervised Parsing in Sight?
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2007
  • Rens Bod (University of St. Andrews, St. Andrews UK; University of Amsterdam, Amsterdam The Netherlands)

How far can we get with unsupervised parsing if we make our training corpus several orders of magnitude larger than has hitherto be attempted? We present a new algorithm for unsupervised parsing using an all-subtrees model, termed U-DOP*, which parses directly with packed forests of all binary trees. We train both on Penn’s WSJ data and on the (much larger) NANC corpus, showing that U-DOP* outperforms a treebank-PCFG on the standard WSJ test set. While U-DOP* performs worse than state-of-the-art supervised parsers on hand- annotated sentences, we show that the model outperforms supervised parsers when evaluated as a language model in syntax-based machine translation on Europarl. We argue that supervised parsers miss the fluidity between constituents and non-constituents and that in the f...