Paper: Use of Mutual Information Based Character Clusters in Dictionary-less Morphological Analysis of Japanese

ACL ID C98-1104
Title Use of Mutual Information Based Character Clusters in Dictionary-less Morphological Analysis of Japanese
Venue International Conference on Computational Linguistics
Session Main Conference
Year 1998
Authors

For languages whose character set is very large and whose orthography does not require spac- ing between words, such as Japanese, tokenizing and part-of-speech tagging are often the diffi- cult parts of any morphological analysis. For practical systems to tackle this problem, un- controlled heuristics are primarily used. The use of information on character sorts, however, mitigates this difficulty. This paper presents our method of incorporating character cluster- ing based on mutual information into Decision- Tree Dictionary-less morphological analysis. By using natural classes, we have confirmed that our morphological analyzer has been signifi- cantly improved in both tokenizing and tagging Japanese text.