Paper: Morphological Analysis for Japanese Noisy Text based on Character-level and Word-level Normalization

ACL ID C14-1167
Title Morphological Analysis for Japanese Noisy Text based on Character-level and Word-level Normalization
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2014
Authors

Social media texts are often written in a non-standard style and include many lexical variants such as insertions, phonetic substitutions, abbreviations that mimic spoken language. The nor- malization of such a variety of non-standard tokens is one promising solution for handling noisy text. A normalization task is very difficult to conduct in Japanese morphological analysis because there are no explicit boundaries between words. To address this issue, in this paper we propose a novel method for normalizing and morphologically analyzing Japanese noisy text. We generate both character-level and word-level normalization candidates and use discriminative methods to formulate a cost function. Experimental results show that the proposed method achieves accept- able levels in both accuracy and r...