Paper: A Part Of Speech Estimation Method For Japanese Unknown Words Using A Statistical Model Of Morphology And Context

ACL ID P99-1036
Title A Part Of Speech Estimation Method For Japanese Unknown Words Using A Statistical Model Of Morphology And Context
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1999
Authors

We present a statistical model of Japanese unknown words consisting of a set of length and spelling models classified by the character types that con- stitute a word. The point is quite simple: differ- ent character sets should be treated differently and the changes between character types are very im- portant because Japanese script has both ideograms like Chinese (kanji) and phonograms like English (katakana). Both word segmentation accuracy and part of speech tagging accuracy are improved by the proposed model. The model can achieve 96.6% tag- ging accuracy if unknown words are correctly seg- mented.