Paper: Fast Large-Scale Approximate Graph Construction for NLP

ACL ID D12-1098
Title Fast Large-Scale Approximate Graph Construction for NLP
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2012

Many natural language processing problems involve constructing large nearest-neighbor graphs. We propose a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data, our algorithm maintains approximate counts based on sketching algorithms. To find the approximate nearest neighbors, our algorithm pairs a new distributed online-PMI algorithm with novel fast approximate near- est neighbor search algorithms (variants of PLEB). These algorithms return the approxi- mate nearest neighbors quickly. We show our system?s efficiency in both intrinsic and ex- trinsic experiments. We further evaluate our fast search algorithms both quantitatively and qualitatively on two NLP applications.