Paper: Bayesian Checking for Topic Models

ACL ID D11-1021
Title Bayesian Checking for Topic Models
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2011

Real document collections do not fit the inde- pendence assumptions asserted by most statis- tical topic models, but how badly do they vi- olate them? We present a Bayesian method for measuring how well a topic model fits a corpus. Our approach is based on posterior predictive checking, a method for diagnosing Bayesian models in user-defined ways. Our method can identify where a topic model fits the data, where it falls short, and in which di- rections it might be improved.