Paper: Estimating effect size across datasets

ACL ID N13-1068
Title Estimating effect size across datasets
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2013

Most NLP tools are applied to text that is dif- ferent from the kind of text they were eval- uated on. Common evaluation practice pre- scribes significance testing across data points in available test data, but typically we only have a single test sample. This short paper argues that in order to assess the robustness of NLP tools we need to evaluate them on diverse samples, and we consider the prob- lem of finding the most appropriate way to es- timate the true effect size across datasets of our systems over their baselines. We apply meta-analysis and show experimentally ? by comparing estimated error reduction over ob- served error reduction on held-out datasets ? that this method is significantly more predic- tive of success than the usual practice of using macro- or micro-averages. Fina...