Paper: Bootstrapping POS-Taggers Using Unlabelled Data

ACL ID W03-0407
Title Bootstrapping POS-Taggers Using Unlabelled Data
Venue International Conference on Computational Natural Language Learning
Session Main Conference
Year 2003

This paper investigates booststrapping part-of- speech taggers using co-training, in which two taggers are iteratively re-trained on each other’s output. Since the output of the taggers is noisy, there is a question of which newly labelled ex- amples to add to the training set. We investi- gate selecting examples by directly maximising tagger agreement on unlabelled data, a method which has been theoretically and empirically motivated in the co-training literature. Our results show that agreement-based co-training can significantly improve tagging performance for small seed datasets. Further results show that this form of co-training considerably out- performs self-training. However, we find that simply re-training on all the newly labelled data can, in some cases, yield comparable resul...