Paper: An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging

ACL ID P09-1058
Title An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2009
Authors

In this paper, we present a discriminative word-character hybrid model for joint Chi- nese word segmentation and POS tagging. Our word-character hybrid model offers high performance since it can handle both known and unknown words. We describe our strategies that yield good balance for learning the characteristics of known and unknown words and propose an error- driven policy that delivers such balance by acquiring examples of unknown words from particular errors in a training cor- pus. We describe an efficient framework for training our model based on the Mar- gin Infused Relaxed Algorithm (MIRA), evaluate our approach on the Penn Chinese Treebank, and show that it achieves supe- rior performance compared to the state-of- the-art approaches reported in the litera- ture.