Paper: Chinese Segmentation And New Word Detection Using Conditional Random Fields

ACL ID C04-1081
Title Chinese Segmentation And New Word Detection Using Conditional Random Fields
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2004
Authors

Chinese word segmentation is a difficult, im- portant and widely-studied sequence modeling problem. This paper demonstrates the abil- ity of linear-chain conditional random fields (CRFs) to perform robust and accurate Chi- nese word segmentation by providing a prin- cipled framework that easily supports the in- tegration of domain knowledge in the form of multiple lexicons of characters and words. We also present a probabilistic new word detection method, which further improves performance. Our system is evaluated on four datasets used in a recent comprehensive Chinese word seg- mentation competition. State-of-the-art perfor- mance is obtained.