Paper: Example-Based Correction Of Word Segmentation And Part Of Speech Labelling

ACL ID H93-1045
Title Example-Based Correction Of Word Segmentation And Part Of Speech Labelling
Venue Human Language Technologies
Session Main Conference
Year 1993
Authors

This paper describes an example-based correction component for Japanese word segmentation and part of speech labelling (AMED), and a way of combining it with a pre-existing rule-based Japanese morphological analyzer and a probabilistic part of speech tagger. Statistical algorithms rely on frequency of phenomena or events in corpora; however, low frequency events are often inadequately represented. Here we report on an example- based technique used in finding word segments and their part of speech in Japanese text. Rather than using hand-crafted rules, the algorithm employs example data, drawing generalizations during training.