Paper: Evaluating the Impact of Coder Errors on Active Learning

ACL ID P11-1005
Title Evaluating the Impact of Coder Errors on Active Learning
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

Active Learning (AL) has been proposed as a technique to reduce the amount of annotated data needed in the context of supervised clas- sification. While various simulation studies for a number of NLP tasks have shown that AL works well on goldstandard data, there is some doubt whether the approach can be suc- cessful when applied to noisy, real-world data sets. This paper presents a thorough evalua- tion of the impact of annotation noise on AL and shows that systematic noise resulting from biased coder decisions can seriously harm the AL process. We present a method to filter out inconsistent annotations during AL and show that this makes AL far more robust when ap- plied to noisy data.