Paper: Improved Source-Channel Models For Chinese Word Segmentation

ACL ID P03-1035
Title Improved Source-Channel Models For Chinese Word Segmentation
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2003

This paper presents a Chinese word segmen- tation system that uses improved source- channel models of Chinese sentence genera- tion. Chinese words are defined as one of the following four types: lexicon words, mor- phologically derived words, factoids, and named entities. Our system provides a unified approach to the four fundamental features of word-level Chinese language processing: (1) word segmentation, (2) morphological analy- sis, (3) factoid detection, and (4) named entity recognition. The performance of the system is evaluated on a manually annotated test set, and is also compared with several state-of- the-art systems, taking into account the fact that the definition of Chinese words often varies from system to system.