Paper: A Bayesian Method for Robust Estimation of Distributional Similarities

ACL ID P10-1026
Title A Bayesian Method for Robust Estimation of Distributional Similarities
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2010
Authors

Existing word similarity measures are not robust to data sparseness since they rely only on the point estimation of words’ context profiles obtained from a limited amount of data. This paper proposes a Bayesian method for robust distributional word similarities. The method uses a dis- tribution of context profiles obtained by Bayesian estimation and takes the expec- tation of a base similarity measure under that distribution. When the context pro- files are multinomial distributions, the pri- ors are Dirichlet, and the base measure is the Bhattacharyya coefficient, we can de- riveananalyticalformthatallowsefficient calculation. For the task of word similar- ityestimationusingalargeamountofWeb data in Japanese, we show that the pro- posedmeasuregivesbetteraccuraciesthan other well-kn...