Paper: Japanese Unknown Word Identification By Character-Based Chunking

ACL ID C04-1066
Title Japanese Unknown Word Identification By Character-Based Chunking
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2004
Authors

We introduce a character-based chunking for un- known word identification in Japanese text. A major advantage of our method is an ability to detect low frequency unknown words of unrestricted character type patterns. The method is built upon SVM-based chunking, by use of character n-gram and surround- ing context of n-best word segmentation candidates from statistical morphological analysis as features. It is applied to newspapers and patent texts, achiev- ing 95% precision and 55-70% recall for newspa- pers and more than 85% precision for patent texts.