Paper: Semi-Supervised Learning for Semantic Relation Classification using Stratified Sampling Strategy

ACL ID D09-1149
Title Semi-Supervised Learning for Semantic Relation Classification using Stratified Sampling Strategy
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2009
Authors

This paper presents a new approach to selecting the initial seed set using stratified sampling strategy in bootstrapping-based semi-supervised learning for semantic relation classification. First, the training data is partitioned into several strata according to relation types/subtypes, then relation instances are randomly sampled from each stratum to form the initial seed set. We also investigate different augmentation strategies in iteratively adding reliable instances to the labeled set, and find that the bootstrapping procedure may stop at a reasonable point to significantly decrease the training time without degrading too much in performance. Experiments on the ACE RDC 2003 and 2004 corpora show the stratified sampling strategy contributes more than the bootstrapp...