Paper: Translingual Document Representations from Discriminative Projections

ACL ID D10-1025
Title Translingual Document Representations from Discriminative Projections
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2010
Authors

Representing documents by vectors that are independent of language enhances machine translation and multilingual text categoriza- tion. We use discriminative training to create a projection of documents from multiple lan- guages into a single translingual vector space. We explore two variants to create these pro- jections: Oriented Principal Component Anal- ysis (OPCA) and Coupled Probabilistic Latent Semantic Analysis (CPLSA). Both of these variants start with a basic model of docu- ments (PCA and PLSA). Each model is then made discriminative by encouraging compa- rable document pairs to have similar vector representations. We evaluate these algorithms on two tasks: parallel document retrieval for Wikipedia and Europarl documents, and cross-lingual text classification on Reuters. The two ...