Paper: Combining Prediction By Partial Matching And Logistic Regression For Thai Word Segmentation

ACL ID C04-1175
Title Combining Prediction By Partial Matching And Logistic Regression For Thai Word Segmentation
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2004
Authors

Word segmentation is an important part of many applications, including information retrieval, information filtering, document analysis, and text summarization. In Thai language, the process is complicated since words are written continuously, and their structures are not well-defined. A recognized effective approach to word segmentation is Longest Matching, a method based on dictionary. Nevertheless, this method suffers from character-level and syllable-level ambiguities in determining word boundaries. This paper proposes a technique to Thai word segmentation using a two-step approach. First, text is segmented, using an application of Prediction by Partial Matching, into syllables whose structures are more well-defined. This reduces the earlier type of ambiguity. Then, the syllables are co...