Paper: Evaluating Roget%s Thesauri

ACL ID P08-1048
Title Evaluating Roget%s Thesauri
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2008

Roget’s Thesaurus has gone through many re- visions since it was first published 150 years ago. But how do these revisions affect Ro- get’s usefulness for NLP? We examine the differences in content between the 1911 and 1987 versions of Roget’s, and we test both ver- sions with each other and WordNet on prob- lems such as synonym identification and word relatedness. We also present a novel method for measuring sentence relatedness that can be implemented in either version of Roget’s or in WordNet. Although the 1987 version of the Thesaurus is better, we show that the 1911 ver- sion performs surprisingly well and that often the differences between the versions of Ro- get’s and WordNet are not statistically signif- icant. We hope that this work will encourage others to use the 1911 ...