Paper: Japanese Word Segmentation By Hidden Markov Model

ACL ID H94-1054
Title Japanese Word Segmentation By Hidden Markov Model
Venue Human Language Technologies
Session Main Conference
Year 1994
Authors

The processing of Japanese text is complicated by the fact that there are no word delimiters. To segment Japanese text, systems typically use knowledge-based methods and large lexicons. This paper presents a novel approach to Japanese word segmentation which avoids the need for Japanese word lexicons and explicit rule bases. The algorithm utilizes a hidden Markov model, a stochastic process, to determine word boundaries. This method has achieved 91% accuracy in segmenting words in a test corpus.