Paper: Self-Organizing N-Gram Model For Automatic Word Spacing

ACL ID P06-1080
Title Self-Organizing N-Gram Model For Automatic Word Spacing
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2006
Authors

An automatic word spacing is one of the important tasks in Korean language pro- cessing and information retrieval. Since there are a number of confusing cases in word spacing of Korean, there are some mistakes in many texts including news ar- ticles. This paper presents a high-accurate method for automatic word spacing based on self-organizing D2-gram model. This method is basically a variant of D2-gram model, but achieves high accuracy by au- tomatically adapting context size. In order to find the optimal context size, the proposed method automatically in- creases the context size when the contex- tual distribution after increasing it dose not agree with that of the current context. It also decreases the context size when the distribution of reduced context is sim- ilar to that of the cur...