Paper: A Corpus-Based Approach To Automatic Compound Extraction

ACL ID P94-1033
Title A Corpus-Based Approach To Automatic Compound Extraction
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1994
Authors

An automatic compound retrieval method is pro- posed to extract compounds within a text mes- sage. It uses n-gram mutual information, relative frequency count and parts of speech as the features for compound extraction. The problem is mod- eled as a two-class classification problem based on the distributional characteristics of n-gram to- kens in the compound and the non-compound clus- ters. The recall and precision using the proposed approach are 96.2% and 48.2% for bigram com- pounds and 96.6% and 39.6% for trigram com- pounds for a testing corpus of 49,314 words. A significant cutdown in processing time has been observed.