Paper: Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations

ACL ID P12-1025
Title Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2012
Authors

We address the issue of consuming heteroge- neous annotation data for Chinese word seg- mentation and part-of-speech tagging. We em- pirically analyze the diversity between two representative corpora, i.e. Penn Chinese Treebank (CTB) and PKU?s People?s Daily (PPD), on manually mapped data, and show that their linguistic annotations are systemat- ically different and highly compatible. The analysis is further exploited to improve pro- cessing accuracy by (1) integrating systems that are respectively trained on heterogeneous annotations to reduce the approximation error, and (2) re-training models with high quality automatically converted data to reduce the es- timation error. Evaluation on the CTB and PPD data shows that our novel model achieves a relative error reduction of 11% over the be...