Paper: Subword-Based Tagging By Conditional Random Fields For Chinese Word Segmentation

ACL ID N06-2049
Title Subword-Based Tagging By Conditional Random Fields For Chinese Word Segmentation
Venue Human Language Technologies
Session Short Paper
Year 2006
Authors
  • Ruiqiang Zhang (NTT Cyber Space Laboratories, Kanagawa Japan)
  • Genichiro Kikui (National Institute of Information and Communications Technology, Kyoto Japan; ATR Spoken Language Communication Research Laboratories, Kyoto Japan)
  • Eiichiro Sumita

We proposed two approaches to improve Chi- nese word segmentation: a subword-based tag- ging and a confidence measure approach. We found the former achieved better performance than the existing character-based tagging, and the latter improved segmentation further by combining the former with a dictionary-based segmentation. In addition, the latter can be used to balance out-of-vocabulary rates and in-vocabulary rates. By these techniques we achieved higher F-scores in CITYU, PKU and MSR corpora than the best results from Sighan Bakeoff 2005.