Paper: A Boundary-Oriented Chinese Segmentation Method Using N-Gram Mutual Information

ACL ID W10-4131
Title A Boundary-Oriented Chinese Segmentation Method Using N-Gram Mutual Information
Venue Joint Conference on Chinese Language Processing
Session Main Conference
Year 2010
Authors

This paper describes our participation in the Chinese word segmentation task of CIPS-SIGHAN 2010. We imple- mented an n-gram mutual information (NGMI) based segmentation algorithm with the mixed-up features from unsu- pervised, supervised and dictionary- based segmentation methods. This al- gorithm is also combined with a simple strategy for out-of-vocabulary (OOV) word recognition. The evaluation for both open and closed training shows encouraging results of our system. The results for OOV word recognition in closed training evaluation were how- ever found unsatisfactory.