Paper: A Novel Word Segmentation Approach for Written Languages with Word Boundary Markers

ACL ID P09-2008
Title A Novel Word Segmentation Approach for Written Languages with Word Boundary Markers
Venue Annual Meeting of the Association of Computational Linguistics
Session Short Paper
Year 2009
Authors

Most NLP applications work under the as- sumption that a user input is error-free; thus, word segmentation (WS) for written languages that use word boundary mark- ers (WBMs), such as spaces, has been re- garded as a trivial issue. However, noisy real-world texts, such as blogs, e-mails, and SMS, may contain spacing errors that require correction before further process- ing may take place. For the Korean lan- guage, many researchers have adopted a traditional WS approach, which eliminates all spaces in the user input and re-inserts proper word boundaries. Unfortunately, such an approach often exacerbates the word spacing quality for user input, which has few or no spacing errors; such is the case, because a perfect WS model does not exist. In this paper, we propose a novel WS method that ta...