Paper: Domain Adaptation for CRF-based Chinese Word Segmentation using Free Annotations

ACL ID D14-1093
Title Domain Adaptation for CRF-based Chinese Word Segmentation using Free Annotations
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2014
Authors

Supervised methods have been the domi- nant approach for Chinese word segmen- tation. The performance can drop signif- icantly when the test domain is different from the training domain. In this paper, we study the problem of obtaining par- tial annotation from freely available data to help Chinese word segmentation on dif- ferent domains. Different sources of free annotations are transformed into a unified form of partial annotation and a variant CRF model is used to leverage both fully and partially annotated data consistently. Experimental results show that the Chi- nese word segmentation model benefits from free partially annotated data. On the SIGHAN Bakeoff 2010 data, we achieve results that are competitive to the best re- ported in the literature.