Paper: Bayesian Inference for PCFGs via Markov Chain Monte Carlo

ACL ID N07-1018
Title Bayesian Inference for PCFGs via Markov Chain Monte Carlo
Venue Human Language Technologies
Session Main Conference
Year 2007
Authors

This paper presents two Markov chain Monte Carlo (MCMC) algorithms for Bayesian inference of probabilistic con- text free grammars (PCFGs) from ter- minal strings, providing an alternative to maximum-likelihood estimation using the Inside-Outside algorithm. We illus- trate these methods by estimating a sparse grammar describing the morphology of the Bantu language Sesotho, demonstrat- ing that with suitable priors Bayesian techniques can infer linguistic structure in situations where maximum likelihood methods such as the Inside-Outside algo- rithm only produce a trivial grammar.