Paper: Shared Components Topic Models

ACL ID N12-1096
Title Shared Components Topic Models
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2012
Authors

With a few exceptions, extensions to latent Dirichlet allocation (LDA) have focused on the distribution over topics for each document. Much less attention has been given to the un- derlying structure of the topics themselves. As a result, most topic models generate topics in- dependently from a single underlying distri- bution and require millions of parameters, in the form of multinomial distributions over the vocabulary. In this paper, we introduce the Shared Components Topic Model (SCTM), in which each topic is a normalized product of a smaller number of underlying component dis- tributions. Our model learns these component distributions and the structure of how to com- bine subsets of them into topics. The SCTM can represent topics in a much more compact representation than LDA and ach...