Paper: Optimizing Semantic Coherence in Topic Models

ACL ID D11-1024
Title Optimizing Semantic Coherence in Topic Models
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2011

Latent variable models have the potential to add value to large document collections by discovering interpretable, low-dimensional subspaces. In order for people to use such models, however, they must trust them. Un- fortunately, typical dimensionality reduction methods for text, such as latent Dirichlet al- location, often produce low-dimensional sub- spaces (topics) that are obviously flawed to human domain experts. The contributions of this paper are threefold: (1) An analysis of the ways in which topics can be flawed; (2) an au- tomated evaluation metric for identifying such topics that does not rely on human annotators or reference collections outside the training data; (3) a novel statistical topic model based on this metric that significantly improves topic quality in a large-scale ...