Paper: Rare Word Translation Extraction from Aligned Comparable Documents

ACL ID P11-1133
Title Rare Word Translation Extraction from Aligned Comparable Documents
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

We present a first known result of high pre- cision rare word bilingual extraction from comparable corpora, using aligned compara- ble documents and supervised classification. We incorporate two features, a context-vector similarity and a co-occurrence model between words in aligned documents in a machine learning approach. We test our hypothesis on different pairs of languages and corpora. We obtain very high F-Measure between 80% and 98% for recognizing and extracting cor- rect translations for rare terms (from 1 to 5 oc- currences). Moreover, we show that our sys- tem can be trained on a pair of languages and test on a different pair of languages, obtain- ing a F-Measure of 77% for the classification of Chinese-English translations using a train- ing corpus of Spanish-French. Our method...