Paper: Assessing the Costs of Sampling Methods in Active Learning for Annotation

ACL ID P08-2017
Title Assessing the Costs of Sampling Methods in Active Learning for Annotation
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2008
Authors

Traditional Active Learning (AL) techniques assumethattheannotationofeachdatumcosts the same. This is not the case when anno- tating sequences; some sequences will take longer than others. We show that the AL tech- nique which performs best depends on how cost is measured. Applying an hourly cost model based on the results of an annotation userstudy,weapproximatetheamountoftime necessary to annotate a given sentence. This model allows us to evaluate the effectiveness of AL sampling methods in terms of time spent in annotation. We acheive a 77% re- duction in hours from a random baseline to achieve 96.5% tag accuracy on the Penn Tree- bank. More significantly, we make the case for measuring cost in assessing AL methods.