Paper: Clustering-Based Stratified Seed Sampling for Semi-Supervised Relation Classification

ACL ID D10-1034
Title Clustering-Based Stratified Seed Sampling for Semi-Supervised Relation Classification
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2010
Authors

Seed sampling is critical in semi-supervised learning. This paper proposes a clustering- based stratified seed sampling approach to semi-supervised learning. First, various clus- tering algorithms are explored to partition the unlabeled instances into different strata with each stratum represented by a center. Then, diversity-motivated intra-stratum sampling is adopted to choose the center and additional instances from each stratum to form the unla- beled seed set for an oracle to annotate. Fi- nally, the labeled seed set is fed into a bootstrapping procedure as the initial labeled data. We systematically evaluate our stratified bootstrapping approach in the semantic rela- tion classification subtask of the ACE RDC (Relation Detection and Classification) task. In particular, ...