Paper: The effect of non-tightness on Bayesian estimation of PCFGs

ACL ID P13-1102
Title The effect of non-tightness on Bayesian estimation of PCFGs
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2013
Authors

Probabilistic context-free grammars have the unusual property of not always defin- ing tight distributions (i.e., the sum of the ?probabilities? of the trees the grammar generates can be less than one). This paper reviews how this non-tightness can arise and discusses its impact on Bayesian es- timation of PCFGs. We begin by present- ing the notion of ?almost everywhere tight grammars? and show that linear CFGs fol- low it. We then propose three different ways of reinterpreting non-tight PCFGs to make them tight, show that the Bayesian estimators in Johnson et al. (2007) are correct under one of them, and provide MCMC samplers for the other two. We conclude with a discussion of the impact of tightness empirically.