Paper: Cross-Lingual Latent Topic Extraction

ACL ID P10-1115
Title Cross-Lingual Latent Topic Extraction
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2010

Probabilistic latent topic models have re- cently enjoyed much success in extracting andanalyzinglatenttopicsintextinanun- supervised way. One common deficiency of existing topic models, though, is that they would not work well for extracting cross-lingual latent topics simply because words in different languages generally do notco-occur with each other. In this paper, we propose a way to incorporate a bilin- gual dictionary into a probabilistic topic modelsothatwecanapplytopicmodelsto extract shared latent topics in text data of different languages. Specifically, we pro- pose a new topic model called Probabilis- tic Cross-Lingual Latent Semantic Anal- ysis (PCLSA) which extends the Proba- bilistic Latent Semantic Analysis (PLSA) model by regularizing its likelihood func- tion with soft co...