Paper: Extracting bilingual terminologies from comparable corpora

ACL ID P13-1040
Title Extracting bilingual terminologies from comparable corpora
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2013

In this paper we present a method for extracting bilingual terminologies from comparable corpora. In our approach we treat bilingual term extrac- tion as a classification problem. For classification we use an SVM binary classifier and training data taken from the EUROVOC thesaurus. We test our approach on a held-out test set from EUROVOC and perform precision, recall and f-measure eval- uations for 20 European language pairs. The per- formance of our classifier reaches the 100% pre- cision level for many language pairs. We also perform manual evaluation on bilingual terms ex- tracted from English-German term-tagged compa- rable corpora. The results of this manual evalu- ation showed 60-83% of the term pairs generated are exact translations and over 90% exact or partial translations.