Paper: An Integrated Approach to Measuring Semantic Similarity between Words Using Information Available on the Web

ACL ID N07-1043
Title An Integrated Approach to Measuring Semantic Similarity between Words Using Information Available on the Web
Venue Human Language Technologies
Session Main Conference
Year 2007
Authors

Measuring semantic similarity between words is vital for various applications in natural language processing, such as language modeling, information retrieval, and document clustering. We propose a method that utilizes the information avail- able on the Web to measure semantic sim- ilarity between a pair of words or entities. We integrate page counts for each word in the pair and lexico-syntactic patterns that occur among the top ranking snippets for the AND query using support vector ma- chines. Experimental results on Miller- Charles’ benchmark data set show that the proposed measure outperforms all the ex- isting web based semantic similarity mea- sures by a wide margin, achieving a cor- relation coefficient of 0.834. Moreover, the proposed semantic similarity measure significantly im...