ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | W99-0610 |
---|---|
Title | Retrieving Collocations From Korean Text |
Venue | 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora |
Session | Main Conference |
Year | 1999 |
Authors |
|
This paper describes a statistical methodology ibr automatically retrieving collocations from POS tagged Korean text using interrupted bi- grams. The free order of Korean makes it hard to identify collocations. We devised four statis- tics, 'frequency', 'randomness', 'condensation', and 'correlation'.to account for the more flexible word order properties of Korean collocations. We extracted meaningful bigrams using an eval- uation ihnction and extended the bigrams to n-gram collocations by generating equivalence sets, a-covers. We view a modeling problem for n-gram collocations as that for clustering of co- hesive words.