Paper: Learning with Annotation Noise

ACL ID P09-1032
Title Learning with Annotation Noise
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2009

It is usually assumed that the kind of noise existing in annotated data is random clas- sification noise. Yet there is evidence thatdifferencesbetweenannotatorsarenot always random attention slips but could result from different biases towards the classification categories, at least for the harder-to-decide cases. Under an annota- tion generation model that takes this into account, there is a hazard that some of the training instances are actually hard cases with unreliable annotations. We show that these are relatively unproblematic for an algorithm operating under the 0-1 loss model, whereas for the commonly used voted perceptron algorithm, hard training cases could result in incorrect prediction on the uncontroversial cases at test time.