Paper: Reducing Semantic Drift with Bagging and Distributional Similarity

ACL ID P09-1045
Title Reducing Semantic Drift with Bagging and Distributional Similarity
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2009
Authors

Iterative bootstrapping algorithms are typ- ically compared using a single set of hand- picked seeds. However, we demonstrate that performance varies greatly depend- ing on these seeds, and favourable seeds for one algorithm can perform very poorly with others, making comparisons unreli- able. We exploit this wide variation with bagging, sampling from automatically ex- tracted seeds to reduce semantic drift. However, semantic drift still occurs in later iterations. We propose an integrated distributional similarity filter to identify and censor potential semantic drifts, en- suring over 10% higher precision when ex- tracting large semantic lexicons.