Paper: Automatic Semantic Sequence Extraction From Unrestricted Non-Tagged Texts

ACL ID C00-1084
Title Automatic Semantic Sequence Extraction From Unrestricted Non-Tagged Texts
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2000
Authors

Mophological processing, syntactic parsing and other useflfl tools have been proposed in the field of natural language processing(NLP). Many of those NLP tools take dictionary-based ap- proaches. Thus these tools are often not very efficient with texts written in casual wordings or texts which contain maw domain-specific terms, because of the lack of vocabulary. In this paper we propose a simple method to obtain domain-specific sequences from unre- stricted texts using statist;ical information only. This method is language-independent. We had experiments oil sequence extraction on email l;exts in Japanese, and succeeded in extracting significant semantic sequences in the test corpus. We tried morphological parsing on the test corpus with ChaSen, a Japanese dictionary-based morphological pa...