Paper: Improving Probabilistic Latent Semantic Analysis With Principal Component Analysis

ACL ID E06-1014
Title Improving Probabilistic Latent Semantic Analysis With Principal Component Analysis
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 2006
Authors

Probabilistic Latent Semantic Analysis (PLSA) models have been shown to pro- vide a better model for capturing poly- semy and synonymy than Latent Seman- tic Analysis (LSA). However, the param- eters of a PLSA model are trained using the Expectation Maximization (EM) algo- rithm, and as a result, the trained model is dependent on the initialization values so that performance can be highly variable. Inthispaper wepresent amethodforusing LSA analysis to initialize a PLSA model. We also investigated the performance of our method for the tasks of text segmenta- tion and retrieval onpersonal-size corpora, and present results demonstrating the effi- cacy of our proposed approach.