Paper: Novel Association Measures Using Web Search With Double Checking

ACL ID P06-1127
Title Novel Association Measures Using Web Search With Double Checking
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2006
Authors

A web search with double checking model is proposed to explore the web as a live corpus. Five association measures including variants of Dice, Overlap Ratio, Jaccard, and Cosine, as well as Co- Occurrence Double Check (CODC), are presented. In the experiments on Ruben- stein-Goodenough’s benchmark data set, the CODC measure achieves correlation coefficient 0.8492, which competes with the performance (0.8914) of the model using WordNet. The experiments on link detection of named entities using the strategies of direct association, associa- tion matrix and scalar association matrix verify that the double-check frequencies are reliable. Further study on named en- tity clustering shows that the five meas- ures are quite useful. In particular, CODC measure is very stable on word- word and nam...