Paper: Bilingual Lexicon Generation Using Non-Aligned Signatures

ACL ID P10-1011
Title Bilingual Lexicon Generation Using Non-Aligned Signatures
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2010

Bilingual lexicons are fundamental re- sources. Modern automated lexicon gen- eration methods usually require parallel corpora, which are not available for most language pairs. Lexicons can be gener- ated using non-parallel corpora or a pivot language, but such lexicons are noisy. We present an algorithm for generating a high quality lexicon from a noisy one, which only requires an independent cor- pus for each language. Our algorithm in- troduces non-aligned signatures (NAS), a cross-lingual word context similarity score that avoids the over-constrained and inef- ficient nature of alignment-based methods. We use NAS to eliminate incorrect transla- tions from the generated lexicon. We eval- uate our method by improving the quality of noisy Spanish-Hebrew lexicons gener- ated from two pivot...