Paper: Morphological Analysis Of A Large Spontaneous Speech Corpus In Japanese

ACL ID P03-1061
Title Morphological Analysis Of A Large Spontaneous Speech Corpus In Japanese
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2003
Authors

This paper describes two methods for de- tecting word segments and their morpho- logical information in a Japanese sponta- neous speech corpus, and describes how to tag a large spontaneous speech corpus accurately by using the two methods. The first method is used to detect any type of word segments. The second method is used when there are several definitions for word segments and their POS categories, and when one type of word segments in- cludes another type of word segments. In this paper, we show that by using semi- automatic analysis we achieve a precision of better than 99% for detecting and tag- ging short words and 97% for long words; the two types of words that comprise the corpus. We also show that better accuracy is achieved by using both methods than by using only the first.