Paper: An Approach to Text Corpus Construction which Cuts Annotation Costs and Maintains Reusability of Annotated Data

ACL ID D07-1051
Title An Approach to Text Corpus Construction which Cuts Annotation Costs and Maintains Reusability of Annotated Data
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2007
Authors

We consider the impact Active Learning (AL) has on effective and efficient text cor- pus annotation, and report on reduction rates for annotation efforts ranging up until 72%. We also address the issue whether a corpus annotated by means of AL – using a particu- lar classifier and a particular feature set – can be re-used to train classifiers different from the ones employed by AL, supplying alter- native feature sets as well. We, finally, report on our experience with the AL paradigm un- der real-world conditions, i.e., the annota- tion of large-scale document corpora for the life sciences.