ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | N12-1096 |
---|---|
Title | Shared Components Topic Models |
Venue | Annual Conference of the North American Chapter of the Association for Computational Linguistics |
Session | Main Conference |
Year | 2012 |
Authors |
With a few exceptions, extensions to latent Dirichlet allocation (LDA) have focused on the distribution over topics for each document. Much less attention has been given to the un- derlying structure of the topics themselves. As a result, most topic models generate topics in- dependently from a single underlying distri- bution and require millions of parameters, in the form of multinomial distributions over the vocabulary. In this paper, we introduce the Shared Components Topic Model (SCTM), in which each topic is a normalized product of a smaller number of underlying component dis- tributions. Our model learns these component distributions and the structure of how to com- bine subsets of them into topics. The SCTM can represent topics in a much more compact representation than LDA and ach...