Paper: Using a Random Forest Classifier to recognise translations of biomedical terms across languages

ACL ID W13-2512
Title Using a Random Forest Classifier to recognise translations of biomedical terms across languages
Venue Building and Using Comparable Corpora
Session
Year 2013
Authors

We present a novel method to recognise semantic equivalents of biomedical terms in language pairs. We hypothesise that biomedical term are formed by seman- tically similar textual units across lan- guages. Based on this hypothesis, we employ a Random Forest (RF) classifier that is able to automatically mine higher order associations between textual units of the source and target language when trained on a corpus of both positive and negative examples. We apply our method on two language pairs: one that uses the same character set and another with a dif- ferent script, English-French and English- Chinese, respectively. We show that English-French pairs of terms are highly transliterated in contrast to the English- Chinese pairs. Nonetheless, our method performs robustly on both cases. We ev...