Paper: Building and Using Comparable Corpora for Domain-Specific Bilingual Lexicon Extraction

ACL ID W11-1204
Title Building and Using Comparable Corpora for Domain-Specific Bilingual Lexicon Extraction
Venue Building and Using Comparable Corpora
Session
Year 2011
Authors

This paper presents a series of experiments aimed at inducing and evaluating doain- specific bilingual lexica from comparable corpora. First, a sml English-Slovene comparable corpus from health magazines was manualy constructed and then used to compile a large comparable corpus on health-related topics from web corpora. Next, a bilngual lexicon for the domain was extracted from the corpus by comparing context vectors in the two languages. Evaluation of the results shows that a 2-way translation of context vectors significantly improves precision of the extracted translation equivalents. We also show that it is suficient to increase the corpus for one language in order to obtain a higher recal, and that the increase of the number of new words is linear in the size of the ...