Paper: Capturing Errors in Written Chinese Words

ACL ID P09-2007
Title Capturing Errors in Written Chinese Words
Venue Annual Meeting of the Association of Computational Linguistics
Session Short Paper
Year 2009

A collection of 3208 reported errors of Chinese words were analyzed. Among which, 7.2% in- volved rarely used character, and 98.4% were assigned common classifications of their causes by human subjects. In particular, 80% of the er- rors observed in writings of middle school stu- dents were related to the pronunciations and 30% were related to the compositions of words. Experimental results show that using intuitive Web-based statistics helped us capture only about 75% of these errors. In a related task, the Web-based statistics are useful for recommend- ing incorrect characters for composing test items for "incorrect character identification" tests about 93% of the time.