Paper: Accurate Word Segmentation using Transliteration and Language Model Projection

ACL ID P13-2033
Title Accurate Word Segmentation using Transliteration and Language Model Projection
Venue Annual Meeting of the Association of Computational Linguistics
Session Short Paper
Year 2013
Authors

Transliterated compound nouns not separated by whitespaces pose diffi- culty on word segmentation (WS). Of- fline approaches have been proposed to split them using word statistics, but they rely on static lexicon, limiting their use. We propose an online ap- proach, integrating source LM, and/or, back-transliteration and English LM. The experiments on Japanese and Chi- nese WS have shown that the pro- posed models achieve significant im- provement over state-of-the-art, reduc- ing 16% errors in Japanese.