Paper: Active Learning with Efficient Feature Weighting Methods for Improving Data Quality and Classification Accuracy

ACL ID P14-1104
Title Active Learning with Efficient Feature Weighting Methods for Improving Data Quality and Classification Accuracy
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

Many machine learning datasets are noisy with a substantial number of mislabeled instances. This noise yields sub-optimal classification performance. In this paper we study a large, low quality annotated dataset, created quickly and cheaply us- ing Amazon Mechanical Turk to crowd- source annotations. We describe compu- tationally cheap feature weighting tech- niques and a novel non-linear distribution spreading algorithm that can be used to it- eratively and interactively correcting mis- labeled instances to significantly improve annotation quality at low cost. Eight dif- ferent emotion extraction experiments on Twitter data demonstrate that our approach is just as effective as more computation- ally expensive techniques. Our techniques save a considerable amount of time.