Paper: HMM Revises Low Marginal Probability by CRF for Chinese Word Segmentation

ACL ID W10-4128
Title HMM Revises Low Marginal Probability by CRF for Chinese Word Segmentation
Venue Joint Conference on Chinese Language Processing
Session Main Conference
Year 2010
Authors

This paper presents a Chinese word segmentation system for CIPS-SIGHAN 2010 Chinese language processing task. Firstly, based on Conditional Random Field (CRF) model, with local features and global features, the character-based tagging model is designed. Secondly, Hidden Markov Models (HMM) is used to revise the substrings with low marginal probability by CRF. Finally, confidence measure is used to regenerate the result and simple rules to deal with the strings within letters and numbers. As is well known that character-based approach has outstanding capability of discovering out-of-vocabulary (OOV) word, but ex- ternal information of word lost. HMM makes use of word information to in- crease in-vocabulary (IV) recall. We par- ticipate in the simplified Chinese word segmen...