Paper: Scaling to Large3 Data: An Efficient and Effective Method to Compute Distributional Thesauri

ACL ID D13-1089
Title Scaling to Large3 Data: An Efficient and Effective Method to Compute Distributional Thesauri
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2013
Authors

We introduce a new highly scalable approach for computing Distributional Thesauri (DTs). By employing pruning techniques and a dis- tributed framework, we make the computation for very large corpora feasible on comparably small computational resources. We demon- strate this by releasing a DT for the whole vo- cabulary of Google Books syntactic n-grams. Evaluating against lexical resources using two measures, we show that our approach pro- duces higher quality DTs than previous ap- proaches, and is thus preferable in terms of speed and quality for large corpora.