Paper: Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing

ACL ID P12-1046
Title Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2012
Authors

We propose Symbol-Refined Tree Substitu- tion Grammars (SR-TSGs) for syntactic pars- ing. An SR-TSG is an extension of the con- ventional TSG model where each nonterminal symbol can be refined (subcategorized) to fit the training data. We aim to provide a unified model where TSG rules and symbol refine- ment are learned from training data in a fully automatic and consistent fashion. We present a novel probabilistic SR-TSG model based on the hierarchical Pitman-Yor Process to en- code backoff smoothing from a fine-grained SR-TSG to simpler CFG rules, and develop an efficient training method based on Markov Chain Monte Carlo (MCMC) sampling. Our SR-TSG parser achieves an F1 score of 92.4% in the Wall Street Journal (WSJ) English Penn Treebank parsing task, which is a 7.7 point im- provement ...