Paper: TBL-Improved Non-Deterministic Segmentation and POS Tagging for a Chinese Parser

ACL ID E09-1031
Title TBL-Improved Non-Deterministic Segmentation and POS Tagging for a Chinese Parser
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 2009
Authors

Although a lot of progress has been made recently in word segmentation and POS tagging for Chinese, the output of cur- rent state-of-the-art systems is too inaccu- rate to allow for syntactic analysis based on it. We present an experiment in im- proving the output of an off-the-shelf mod- ule that performs segmentation and tag- ging, the tokenizer-tagger from Beijing University (PKU). Our approach is based on transformation-based learning (TBL). Unlike in other TBL-based approaches to the problem, however, both obligatory and optional transformation rules are learned, so that the final system can output multi- ple segmentation and POS tagging anal- yses for a given input. By allowing for a small amount of ambiguity in the out- put of the tokenizer-tagger, we achieve a very considerable imp...