Paper: An Iterative Algorithm To Build Chinese Language Models

ACL ID P96-1019
Title An Iterative Algorithm To Build Chinese Language Models
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 1996

° • We present an iterative procedure to build a Chinese language model (LM). We seg- ment Chinese text into words based on a word-based Chinese language model. How- ever, the construction of a Chinese LM it- self requires word boundaries. To get out of the chicken-and-egg problem, we propose an iterative procedure that alternates two operations: segmenting text into words and building an LM. Starting with an initial segmented corpus and an LM based upon it, we use a Viterbi-liek algorithm to seg- ment another set of data. Then, we build an LM based on the second set and use the resulting LM to segment again the first cor- pus. The alternating procedure provides a self-organized way for the segmenter to de- tect automatically unseen words and cor- rect segmentation errors. Our prelimi- ...