Paper: EM-based Hybrid Model for Bilingual Terminology Extraction from Comparable Corpora

ACL ID C10-2073
Title EM-based Hybrid Model for Bilingual Terminology Extraction from Comparable Corpora
Venue International Conference on Computational Linguistics
Session Poster Session
Year 2010
Authors

In this paper, we present an unsuper- vised hybrid model which combines sta- tistical, lexical, linguistic, contextual, and temporal features in a generic EM- based framework to harvest bilingual terminology from comparable corpora through comparable document align- ment constraint. The model is configur- able for any language and is extensible for additional features. In overall, it pro- duces considerable improvement in per- formance over the baseline method. On top of that, our model has shown prom- ising capability to discover new bilin- gual terminology with limited usage of dictionaries.