Paper: Retrieving Collocations From Korean Text

ACL ID W99-0610
Title Retrieving Collocations From Korean Text
Venue 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora
Session Main Conference
Year 1999

This paper describes a statistical methodology ibr automatically retrieving collocations from POS tagged Korean text using interrupted bi- grams. The free order of Korean makes it hard to identify collocations. We devised four statis- tics, 'frequency', 'randomness', 'condensation', and 'correlation'.to account for the more flexible word order properties of Korean collocations. We extracted meaningful bigrams using an eval- uation ihnction and extended the bigrams to n-gram collocations by generating equivalence sets, a-covers. We view a modeling problem for n-gram collocations as that for clustering of co- hesive words.