Paper: Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation

ACL ID P11-1141
Title Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

Lots of Chinese characters are very produc- tive in that they can form many structured words either as prefixes or as suffixes. Pre- vious research in Chinese word segmentation mainly focused on identifying only the word boundaries without considering the rich inter- nal structures of many words. In this paper we argue that this is unsatisfying in many ways, both practically and theoretically. Instead, we propose that word structures should be recov- ered in morphological analysis. An elegant approach for doing this is given and the result is shown to be promising enough for encour- aging further effort in this direction. Our prob- ability model is trained with the Penn Chinese Treebank and actually is able to parse both word and phrase structures in a unified way. 1 Why Parse Word Structu...