Paper: A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes

ACL ID D12-1020
Title A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2012
Authors

Topic models traditionally rely on the bag- of-words assumption. In data mining appli- cations, this often results in end-users being presented with inscrutable lists of topical un- igrams, single words inferred as representa- tive of their topics. In this article, we present a hierarchical generative probabilistic model of topical phrases. The model simultane- ously infers the location, length, and topic of phrases within a corpus and relaxes the bag- of-words assumption within phrases by using a hierarchy of Pitman-Yor processes. We use Markov chain Monte Carlo techniques for ap- proximate inference in the model and perform slice sampling to learn its hyperparameters. We show via an experiment on human subjects that our model finds substantially better, more interpretable topical phrases...