Paper: Paradigmatic Modifiability Statistics For The Extraction Of Complex Multi-Word Terms

ACL ID H05-1106
Title Paradigmatic Modifiability Statistics For The Extraction Of Complex Multi-Word Terms
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2005
Authors

We here propose a new method which sets apart domain-speci c terminology from common non-speci c noun phrases. It is based on the observation that termino- logical multi-word groups reveal a con- siderably lesser degree of distributional variation than non-speci c noun phrases. We de ne a measure for the observable amount of paradigmatic modi ability of terms and, subsequently, test it on bigram, trigram and quadgram noun phrases ex- tracted from a 104-million-word biomedi- cal text corpus. Using a community-wide curated biomedical terminology system as an evaluation gold standard, we show that our algorithm signi cantly outperforms a variety of standard term identi cation measures. We also provide empirical ev- idence that our methodolgy is essentially domain- and corpus-size-independent....