Paper: Clustering Comparable Corpora For Bilingual Lexicon Extraction

ACL ID P11-2083
Title Clustering Comparable Corpora For Bilingual Lexicon Extraction
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

We study in this paper the problem of enhanc- ing the comparability of bilingual corpora in order to improve the quality of bilingual lexi- cons extracted from comparable corpora. We introduce a clustering-based approach for en- hancing corpus comparability which exploits the homogeneity feature of the corpus, and finally preserves most of the vocabulary of the original corpus. Our experiments illus- trate the well-foundednessof this method and show that the bilingual lexicons obtained from the homogeneous corpus are of better quality than the lexicons obtained with previous ap- proaches.