Paper: Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora

ACL ID D09-1026
Title Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2009
Authors

A significant portion of the world’s text is tagged by readers on social bookmark- ing websites. Credit attribution is an in- herent problem in these corpora because most pages have multiple tags, but the tags do not always apply with equal specificity across the whole document. Solving the credit attribution problem requires associ- ating each word in a document with the most appropriate tags and vice versa. This paper introduces Labeled LDA, a topic model that constrains Latent Dirichlet Al- location by defining a one-to-one corre- spondence between LDA’s latent topics and user tags. This allows Labeled LDA to directly learn word-tag correspondences. We demonstrate Labeled LDA’s improved expressiveness over traditional LDA with visualizations of a corpus of tagged web pages from de...