Paper: Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora

ACL ID E14-4022
Title Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

We describe a machine learning approach, a Random Forest (RF) classifier, that is used to automatically compile bilingual dictionaries of technical terms from com- parable corpora. We evaluate the RF clas- sifier against a popular term alignment method, namely context vectors, and we report an improvement of the translation accuracy. As an application, we use the automatically extracted dictionary in com- bination with a trained Statistical Machine Translation (SMT) system to more accu- rately translate unknown terms. The dic- tionary extraction method described in this paper is freely available 1 .