Paper: Parsing low-resource languages using Gibbs sampling for PCFGs with latent annotations

ACL ID D14-1035
Title Parsing low-resource languages using Gibbs sampling for PCFGs with latent annotations
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2014
Authors

PCFGs with latent annotations have been shown to be a very effective model for phrase structure parsing. We present a Bayesian model and algorithms based on a Gibbs sam- pler for parsing with a grammar with latent an- notations. For PCFG-LA, we present an ad- ditional Gibbs sampler algorithm to learn an- notations from training data, which are parse trees with coarse (unannotated) symbols. We show that a Gibbs sampling technique is ca- pable of parsing sentences in a wide variety of languages and producing results that are on-par with or surpass previous approaches. Our results for Kinyarwanda and Malagasy in particular demonstrate that low-resource lan- guage parsing can benefit substantially from a Bayesian approach.