Paper: PCFGs Topic Models Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names

ACL ID P10-1117
Title PCFGs Topic Models Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2010
Authors

This paper establishes a connection be- tween two apparently very different kinds of probabilistic models. Latent Dirich- let Allocation (LDA) models are used as “topic models” to produce a low- dimensional representation of documents, while Probabilistic Context-Free Gram- mars (PCFGs) define distributions over trees. The paper begins by showing that LDA topic models can be viewed as a special kind of PCFG, so Bayesian in- ference for PCFGs can be used to infer Topic Models as well. Adaptor Grammars (AGs) are a hierarchical, non-parameteric Bayesian extension of PCFGs. Exploit- ing the close relationship between LDA and PCFGs just described, we propose two novel probabilistic models that com- bine insights from LDA and AG models. The first replaces the unigram component of LDA topic m...