Paper: Document Classification Using Domain Specific Kanji Characters Extracted By X2 Method

ACL ID C96-2134
Title Document Classification Using Domain Specific Kanji Characters Extracted By X2 Method
Venue International Conference on Computational Linguistics
Session Main Conference
Year 1996
Authors

In this paper we describe a method of classifying Japanese text documents using domain specific kanji charactcrs. Text documents are generally cb~ssified by significant words (keywords) of the documents. However, it is difficult to extract these significant words from Japanese text, because Japanese texts are written without using blank spaces, such as de- limiters, and must be segmented into words. There- fore, instead of words, we used domain specific kanji characters which appear more frequently in one do- main than the other. We extracted these domain specific kanji characters by X,2 method. Then, us- ing these domain specific kanji characters, we clas- sifted editorial columns "TENSEI JINGO", edito- rim articles, and articles in "Scientific American (in Japanese)". The correct recogni...