Paper: Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging – A Case Study

ACL ID P09-1059
Title Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging – A Case Study
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2009
Authors

Manually annotated corpora are valuable but scarce resources, yet for many anno- tation tasks such as treebanking and se- quence labeling there exist multiple cor- pora with different and incompatible anno- tation guidelines or standards. This seems to be a great waste of human efforts, and it would be nice to automatically adapt one annotation standard to another. We present a simple yet effective strategy that transfers knowledge from a differently an- notated corpus to the corpus with desired annotation. We test the efficacy of this method in the context of Chinese word segmentation and part-of-speech tagging, where no segmentation and POS tagging standards are widely accepted due to the lack of morphology in Chinese. Experi- ments show that adaptation from the much larger People’s Da...