Paper: Multilingual Models for Compositional Distributed Semantics

ACL ID P14-1006
Title Multilingual Models for Compositional Distributed Semantics
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014

We present a novel technique for learn- ing semantic representations, which ex- tends the distributional hypothesis to mul- tilingual data and joint-space embeddings. Our models leverage parallel data and learn to strongly align the embeddings of semantically equivalent sentences, while maintaining sufficient distance between those of dissimilar sentences. The mod- els do not rely on word alignments or any syntactic information and are success- fully applied to a number of diverse lan- guages. We extend our approach to learn semantic representations at the document level, too. We evaluate these models on two cross-lingual document classification tasks, outperforming the prior state of the art. Through qualitative analysis and the study of pivoting effects we demonstrate that our representa...