Paper: A Topic Model for Word Sense Disambiguation

ACL ID D07-1109
Title A Topic Model for Word Sense Disambiguation
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2007

We develop latent Dirichlet allocation with WORDNET (LDAWN), an unsupervised probabilistic topic model that includes word sense as a hidden variable. We develop a probabilistic posterior inference algorithm for simultaneously disambiguating a corpus and learning the domains in which to con- sider each word. Using the WORDNET hi- erarchy, we embed the construction of Ab- ney and Light (1999) in the topic model and show that automatically learned domains improve WSD accuracy compared to alter- native contexts.