Paper: Chinese Comma Disambiguation for Discourse Analysis

ACL ID P12-1083
Title Chinese Comma Disambiguation for Discourse Analysis
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2012

The Chinese comma signals the boundary of discourse units and also anchors discourse relations between adjacent text spans. In this work, we propose a discourse structure- oriented classification of the comma that can be automatically extracted from the Chinese Treebank based on syntactic patterns. We then experimented with two supervised learn- ing methods that automatically disambiguate the Chinese comma based on this classifica- tion. The first method integrates comma clas- sification into parsing, and the second method adopts a ?post-processing? approach that ex- tracts features from automatic parses to train a classifier. The experimental results show that the second approach compares favorably against the first approach.