Paper: Accurate Word Segmentation and POS Tagging for Japanese Microblogs: Corpus Annotation and Joint Modeling with Lexical Normalization

ACL ID D14-1011
Title Accurate Word Segmentation and POS Tagging for Japanese Microblogs: Corpus Annotation and Joint Modeling with Lexical Normalization
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2014
Authors

Microblogs have recently received widespread interest from NLP re- searchers. However, current tools for Japanese word segmentation and POS tagging still perform poorly on microblog texts. We developed an annotated corpus and proposed a joint model for over- coming this situation. Our annotated corpus of microblog texts enables not only training of accurate statistical models but also quantitative evaluation of their performance. Our joint model with lexical normalization handles the orthographic diversity of microblog texts. We con- ducted an experiment to demonstrate that the corpus and model substantially contribute to boosting accuracy.