Paper: Improving Vector Space Word Representations Using Multilingual Correlation

ACL ID E14-1049
Title Improving Vector Space Word Representations Using Multilingual Correlation
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

The distributional hypothesis of Harris (1954), according to which the meaning of words is evidenced by the contexts they occur in, has motivated several effec- tive techniques for obtaining vector space semantic representations of words using unannotated text corpora. This paper ar- gues that lexico-semantic content should additionally be invariant across languages and proposes a simple technique based on canonical correlation analysis (CCA) for incorporating multilingual evidence into vectors generated monolingually. We evaluate the resulting word representations on standard lexical semantic evaluation tasks and show that our method produces substantially better semantic representa- tions than monolingual techniques.