Paper: Two-Stage Hashing for Fast Document Retrieval

ACL ID P14-2081
Title Two-Stage Hashing for Fast Document Retrieval
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

This work fulfills sublinear time Near- est Neighbor Search (NNS) in massive- scale document collections. The primary contribution is to propose a two-stage unsupervised hashing framework which harmoniously integrates two state-of-the- art hashing algorithms Locality Sensitive Hashing (LSH) and Iterative Quantization (ITQ). LSH accounts for neighbor candi- date pruning, while ITQ provides an ef- ficient and effective reranking over the neighbor pool captured by LSH. Further- more, the proposed hashing framework capitalizes on both term and topic similar- ity among documents, leading to precise document retrieval. The experimental re- sults convincingly show that our hashing based document retrieval approach well approximates the conventional Informa- tion Retrieval (IR) method in terms of ...