Paper: Cross Language Text Classification by Model Translation and Semi-Supervised Learning

ACL ID D10-1103
Title Cross Language Text Classification by Model Translation and Semi-Supervised Learning
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2010
Authors

In this paper, we introduce a method that au- tomatically builds text classifiers in a new lan- guage by training on already labeled data in another language. Our method transfers the classification knowledge across languages by translating the model features and by using an Expectation Maximization (EM) algorithm that naturally takes into account the ambigu- ity associated with the translation of a word. We further exploit the readily available un- labeled data in the target language via semi- supervised learning, and adapt the translated model to better fit the data distribution of the target language.