Paper: Automatic Acquisition of Basic Katakana Lexicon from a Given Corpus

ACL ID I05-1060
Title Automatic Acquisition of Basic Katakana Lexicon from a Given Corpus
Venue International Joint Conference on Natural Language Processing
Session Main Conference
Year 2005
Authors

Katakana, Japanese phonogram mainly used for loan words, is a trou-blemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to construct precise and concise Katakana word dictionary automati-cally, given only a medium or large size of Japanese corpus of some domain.