ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | D09-1026 |
---|---|
Title | Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora |
Venue | Conference on Empirical Methods in Natural Language Processing |
Session | Main Conference |
Year | 2009 |
Authors |
|
A significant portion of the world’s text is tagged by readers on social bookmark- ing websites. Credit attribution is an in- herent problem in these corpora because most pages have multiple tags, but the tags do not always apply with equal specificity across the whole document. Solving the credit attribution problem requires associ- ating each word in a document with the most appropriate tags and vice versa. This paper introduces Labeled LDA, a topic model that constrains Latent Dirichlet Al- location by defining a one-to-one corre- spondence between LDA’s latent topics and user tags. This allows Labeled LDA to directly learn word-tag correspondences. We demonstrate Labeled LDA’s improved expressiveness over traditional LDA with visualizations of a corpus of tagged web pages from de...