Paper: Type-based MCMC for Sampling Tree Fragments from Forests

ACL ID D14-1180
Title Type-based MCMC for Sampling Tree Fragments from Forests
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2014
Authors

This paper applies type-based Markov Chain Monte Carlo (MCMC) algorithms to the problem of learning Synchronous Context-Free Grammar (SCFG) rules from a forest that represents all possible rules consistent with a fixed word align- ment. While type-based MCMC has been shown to be effective in a number of NLP applications, our setting, where the tree structure of the sentence is itself a hid- den variable, presents a number of chal- lenges to type-based inference. We de- scribe methods for defining variable types and efficiently indexing variables in or- der to overcome these challenges. These methods lead to improvements in both log likelihood and BLEU score in our experi- ments.