Paper: A Maximum Entropy Chinese Character-Based Parser

ACL ID W03-1025
Title A Maximum Entropy Chinese Character-Based Parser
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2003
  • Xiaoqiang Luo (IBM T.J. Watson Research Center, Yorktown Heights NY)

The paper presents a maximum entropy Chinese character-based parser trained on the Chinese Treebank ( CTB hence- forth). Word-based parse trees in CTB are rst converted into character- based trees, where word-level part-of- speech (POS) tags become constituent labels and character-level tags are de- rived from word-level POS tags. A maximum entropy parser is then trained on the character-based corpus. The parser does word-segmentation, POS- tagging and parsing in a uni ed frame- work. An average label F-measure a0a2a1a4a3a6a5a8a7 and word-segmentation F-measure a9a11a10 a3a13a12a14a7 are achieved by the parser. Our re- sults show that word-level POS tags can improve signi cantly word-segmentation, but higher-level syntactic strutures are of little use to word segmentation in the max- imum ...