Paper: The CIPS-SIGHAN CLP2010 Chinese Word Segmentation Backoff

ACL ID W10-4126
Title The CIPS-SIGHAN CLP2010 Chinese Word Segmentation Backoff
Venue Joint Conference on Chinese Language Processing
Session Main Conference
Year 2010
Authors

The CIPS-SIGHAN CLP 2010 Chinese Word Segmentation Bakeoff was held in the summer of 2010 to evaluate the current state of the art in word segmentation. It focused on the cross- domain performance of Chinese word segmentation algorithms. Eighteen groups submitted 128 results over two tracks (open training and closed training), four domains (literature, computer science, medicine and finance) and two subtasks (simplified Chinese and traditional Chinese). We found that compared with the previous Chinese word segmentation bakeoffs, the performance of cross-domain Chinese word segmentation is not much lower, and the out-of-vocabulary recall is improved.