Paper: Randomized Algorithms And NLP: Using Locality Sensitive Hash Functions For High Speed Noun Clustering

ACL ID P05-1077
Title Randomized Algorithms And NLP: Using Locality Sensitive Hash Functions For High Speed Noun Clustering
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2005
Authors

In this paper, we explore the power of randomized algorithm to address the chal- lenge of working with very large amounts of data. We apply these algorithms to gen- erate noun similarity lists from 70 million pages. We reduce the running time from quadratic to practically linear in the num- ber of elements to be computed.