Paper: Semi-Supervised Conditional Random Fields For Improved Sequence Segmentation And Labeling

ACL ID P06-1027
Title Semi-Supervised Conditional Random Fields For Improved Sequence Segmentation And Labeling
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2006
Authors

We present a new semi-supervised training procedure for conditional random fields (CRFs) that can be used to train sequence segmentors and labelers from a combina- tion of labeled and unlabeled training data. Our approach is based on extending the minimum entropy regularization frame- work to the structured prediction case, yielding a training objective that combines unlabeled conditional entropy with labeled conditional likelihood. Although the train- ing objective is no longer concave, it can still be used to improve an initial model (e.g. obtained from supervised training) by iterative ascent. We apply our new training algorithm to the problem of iden- tifying gene and protein mentions in bio- logical texts, and show that incorporating unlabeled data improves the performance of the supe...