Paper: Improvements to the Bayesian Topic N-Gram Models

ACL ID D13-1118
Title Improvements to the Bayesian Topic N-Gram Models
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2013

One of the language phenomena that n-gram language model fails to capture is the topic in- formation of a given situation. We advance the previous study of the Bayesian topic language model by Wallach (2006) in two directions: one, investigating new priors to alleviate the sparseness problem caused by dividing all n- grams into exclusive topics, and two, develop- ing a novel Gibbs sampler that enables moving multiple n-grams across different documents to another topic. Our blocked sampler can efficiently search for higher probability space even with higher order n-grams. In terms of modeling assumption, we found it is effective to assign a topic to only some parts of a docu- ment.