Paper: Robust Segmentation Of Japanese Text Into A Lattice For Parsing

ACL ID C00-1057
Title Robust Segmentation Of Japanese Text Into A Lattice For Parsing
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2000
Authors

We describe a segmentation component that utilizes minimal syntactic knowledge to produce a lattice of word candidates for a broad coverage Japanese NL parser. The segmenter is a finite state morphological analyzer and text normalizer designed to handle the orthographic variations characteristic of written Japanese, including alternate spellings, script variation, vowel extensions and word-internal parenthetical material. This architecture differs from con- ventional Japanese wordbreakers in that it does not attempt to simultaneously attack the problems of identifying segmentation candidates and choosing the most probable analysis. To minimize duplication of effort between components and to give the segmenter greater fi'eedom to address orthography issues, the task of choosing the best ana...