Paper: Probabilistic Models of Cross-Lingual Semantic Similarity in Context Based on Latent Cross-Lingual Concepts Induced from Comparable Data

ACL ID D14-1040
Title Probabilistic Models of Cross-Lingual Semantic Similarity in Context Based on Latent Cross-Lingual Concepts Induced from Comparable Data
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2014
Authors

We propose the first probabilistic approach to modeling cross-lingual semantic sim- ilarity (CLSS) in context which requires only comparable data. The approach re- lies on an idea of projecting words and sets of words into a shared latent semantic space spanned by language-pair indepen- dent latent semantic concepts (e.g., cross- lingual topics obtained by a multilingual topic model). These latent cross-lingual concepts are induced from a comparable corpus without any additional lexical re- sources. Word meaning is represented as a probability distribution over the latent concepts, and a change in meaning is rep- resented as a change in the distribution over these latent concepts. We present new models that modulate the isolated out-of- context word representations with contex- tual knowle...