Paper: Word Clustering and Disambiguation Based on Co-occurrence Data

ACL ID P98-2124
Title Word Clustering and Disambiguation Based on Co-occurrence Data
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1998
Authors

We address the problem of clustering words (or con- structing a thesaurus) based on co-occurrence data, and using the acquired word classes to improve the accuracy of syntactic disambiguation. We view this problem as that of estimating a joint probability dis- tribution specifying the joint probabilities of word pairs, such as noun verb pairs. We propose an effi- cient algorithm based on the Minimum Description Length (MDL) principle for estimating such a prob- ability distribution. Our method is a natural ex- tension of those proposed in (Brown et al. , 1992) and (Li and Abe, 1996), and overcomes their draw- backs while retaining their advantages. We then combined this clustering method with the disam- biguation method of (Li and Abe, 1995) to derive a disambiguation method that makes use...