Paper: Joint Chinese Word Segmentation and POS Tagging on Heterogeneous Annotated Corpora with Multiple Task Learning

ACL ID D13-1062
Title Joint Chinese Word Segmentation and POS Tagging on Heterogeneous Annotated Corpora with Multiple Task Learning
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2013
Authors

Chinese word segmentation and part-of- speech tagging (S&T) are fundamental steps for more advanced Chinese language processing tasks. Recently, it has at- tracted more and more research interests to exploit heterogeneous annotation cor- pora for Chinese S&T. In this paper, we propose a unified model for Chinese S&T with heterogeneous annotation corpora. We first automatically construct a loose and uncertain mapping between two rep- resentative heterogeneous corpora, Penn Chinese Treebank (CTB) and PKU?s Peo- ple?s Daily (PPD). Then we regard the Chinese S&T with heterogeneous corpora as two ?related? tasks and train our model on two heterogeneous corpora simultane- ously. Experiments show that our method can boost the performances of both of the heterogeneous corpora by using the shared i...