Paper: Looking at Unbalanced Specialized Comparable Corpora for Bilingual Lexicon Extraction

ACL ID P14-1121
Title Looking at Unbalanced Specialized Comparable Corpora for Bilingual Lexicon Extraction
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

The main work in bilingual lexicon ex- traction from comparable corpora is based on the implicit hypothesis that corpora are balanced. However, the historical context- based projection method dedicated to this task is relatively insensitive to the sizes of each part of the comparable corpus. Within this context, we have carried out a study on the influence of unbalanced specialized comparable corpora on the quality of bilingual terminology extraction through different experiments. Moreover, we have introduced a regression model that boosts the observations of word co- occurrences used in the context-based pro- jection method. Our results show that the use of unbalanced specialized comparable corpora induces a significant gain in the quality of extracted lexicons.