Paper: Word Clustering and Disambiguation Based on Co-occurrence Data

ACL ID C98-2119
Title Word Clustering and Disambiguation Based on Co-occurrence Data
Venue International Conference on Computational Linguistics
Session Main Conference
Year 1998
Authors

We address the problem of clustering words (or con- structing a thesaurus) based on co-occurrence data, and using the acquired word classes to improve the accuracy of syntactic disambiguation. We view this problem as that of estimating a joint probability dis- tribution specifying the joint probabilities of word pairs, such as noun verb pairs. We propose an effi- cient algorithm based on the Minimum Description Length (MDL) principle for estimating such a prob- ability distribution. Our method is a natural ex- tension of those proposed in (Brown et al., 1992) and (Li and Abe, 1996), and overcomes their draw- backs while retaining their advantages. We then coinbined this clustering method with the disam- I)iguation method of (Li and Abe, 1995) to derive a disambiguation meth...