Paper: Bilingual Termbank Creation via Log-Likelihood Comparison and Phrase-Based Statistical Machine Translation

ACL ID W14-4806
Title Bilingual Termbank Creation via Log-Likelihood Comparison and Phrase-Based Statistical Machine Translation
Venue CompuTerm International Workshop On Computational Terminology
Session
Year 2014
Authors

Bilingual termbanks are important for many natural language processing (NLP) applications, es- pecially in translation workflows in industrial settings. In this paper, we apply a log-likelihood comparison method to extract monolingual terminology from the source and target sides of a parallel corpus. Then, using a Phrase-Based Statistical Machine Translation model, we create a bilingual terminology with the extracted monolingual term lists. We manually evaluate our novel terminology extraction model on English-to-Spanish and English-to-Hindi data sets, and observe excellent performance for all domains. Furthermore, we report the performance of our monolin- gual terminology extraction model comparing with a number of the state-of-the-art terminology extraction models on the English-to-Hindi...