ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | C04-1081 |
---|---|
Title | Chinese Segmentation And New Word Detection Using Conditional Random Fields |
Venue | International Conference on Computational Linguistics |
Session | Main Conference |
Year | 2004 |
Authors |
|
Chinese word segmentation is a difficult, im- portant and widely-studied sequence modeling problem. This paper demonstrates the abil- ity of linear-chain conditional random fields (CRFs) to perform robust and accurate Chi- nese word segmentation by providing a prin- cipled framework that easily supports the in- tegration of domain knowledge in the form of multiple lexicons of characters and words. We also present a probabilistic new word detection method, which further improves performance. Our system is evaluated on four datasets used in a recent comprehensive Chinese word seg- mentation competition. State-of-the-art perfor- mance is obtained.