Paper: Semi-Supervised Chinese Word Segmentation Using Partial-Label Learning With Conditional Random Fields

ACL ID D14-1010
Title Semi-Supervised Chinese Word Segmentation Using Partial-Label Learning With Conditional Random Fields
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2014
Authors

There is rich knowledge encoded in on- line web data. For example, punctua- tion and entity tags in Wikipedia data define some word boundaries in a sen- tence. In this paper we adopt partial-label learning with conditional random fields to make use of this valuable knowledge for semi-supervised Chinese word segmenta- tion. The basic idea of partial-label learn- ing is to optimize a cost function that marginalizes the probability mass in the constrained space that encodes this knowl- edge. By integrating some domain adap- tation techniques, such as EasyAdapt, our result reaches an F-measure of 95.98% on the CTB-6 corpus, a significant improve- ment from both the supervised baseline and a previous proposed approach, namely constrained decode.