Paper: Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering

ACL ID D13-1016
Title Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2013
Authors

Domain adaptation has been popularly stud- ied on exploiting labeled information from a source domain to learn a prediction model in a target domain. In this paper, we develop a novel representation learning approach to ad- dress domain adaptation for text classification with automatically induced discriminative la- tent features, which are generalizable across domains while informative to the prediction task. Specifically, we propose a hierarchical multinomial Naive Bayes model with latent variables to conduct supervised word cluster- ing on labeled documents from both source and target domains, and then use the produced cluster distribution of each word as its la- tent feature representation for domain adapta- tion. We train this latent graphical model us- ing a simple expectation-maximi...