Paper: Bilingual Terminology Mining - Using Brain not brawn comparable corpora

ACL ID P07-1084
Title Bilingual Terminology Mining - Using Brain not brawn comparable corpora
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2007
Authors

Current research in text mining favours the quantity of texts over their quality. But for bilingual terminology mining, and for many language pairs, large comparable corpora are not available. More importantly, as terms are defined vis-à-vis a specific domain with a restricted register, it is expected that the quality rather than the quantity of the corpus matters more in terminology mining. Our hypothesis, therefore, is that the quality of the corpus is more important than the quan- tity and ensures the quality of the acquired terminological resources. We show how im- portant the type of discourse is as a charac- teristic of the comparable corpus.