Paper: Review Topic Discovery with Phrases using the Pólya Urn Model

ACL ID C14-1063
Title Review Topic Discovery with Phrases using the Pólya Urn Model
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2014
Authors

Topic modelling has been popularly used to discover latent topics from text documents. Most existing models work on individual words. That is, they treat each topic as a distribution over words. However, using only individual words has several shortcomings. First, it increases the co-occurrences of words which may be incorrect because a phrase with two words is not equivalent to two separate words. These extra and often incorrect co-occurrences result in poorer output topics. A multi-word phrase should be treated as one term by itself. Second, individual words are often difficult to use in practice because the meaning of a word in a phrase and the meaning of a word in isolation can be quite different. Third, topics as a list of individual words are also difficult to understand by us...