Paper: Learning Bilingual Lexicons from Monolingual Corpora

ACL ID P08-1088
Title Learning Bilingual Lexicons from Monolingual Corpora
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2008
Authors

We present a method for learning bilingual translation lexicons from monolingual cor- pora. Word types in each language are charac- terized by purely monolingual features, such as context counts and orthographic substrings. Translations are induced using a generative model based on canonical correlation analy- sis, which explains the monolingual lexicons in terms of latent matchings. We show that high-precision lexicons can be learned in a va- riety of language pairs and from a range of corpus types.