Paper: More Accurate Tests For The Statistical Significance Of Result Differences

ACL ID C00-2137
Title More Accurate Tests For The Statistical Significance Of Result Differences
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2000
Authors

Statisti(:a,1 signiticance testing of (litl'erelmeS in v;~hl(`-s of metri(:s like recall, i)rccision and bat- au(:(~(l F-s(:()rc is a ne(:(`-ssary t)art of eml)irical ual;ural language 1)ro(:essing. Unfortunately, we lind in a set of (;Xl)erinlc]d;s (;hal; many (:ore- inertly used tesl;s ofte, n underestimate t.he signif icancc an(l so are less likely to detect differences that exist 1)el;ween ditl'ercnt techniques. This undel'esi;imation comes from an in(let)endcn('(~ a,-;SUlnl)tion that is often violated. ~fe l)oint out some useful l;e,%s (;hal; (lo nol; make this assuml)- lion, including computationally--intcnsive ran- d()mizat,ion 1;cs|;s. 1 Introdu(-tion In Clnl)irical natural ]al~gUag(~ l)rocessing, on(', is ot'tcal |:('~st;ing whether some new technique 1)ro(lu('es im])rove(l l'esul...