Paper: Discriminative Pruning Of Language Models For Chinese Word Segmentation

ACL ID P06-1126
Title Discriminative Pruning Of Language Models For Chinese Word Segmentation
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2006
Authors

This paper presents a discriminative pruning method of n-gram language model for Chinese word segmentation. To reduce the size of the language model that is used in a Chinese word segmenta- tion system, importance of each bigram is computed in terms of discriminative pruning criterion that is related to the per- formance loss caused by pruning the bi- gram. Then we propose a step-by-step growing algorithm to build the language model of desired size. Experimental re- sults show that the discriminative pruning method leads to a much smaller model compared with the model pruned using the state-of-the-art method. At the same Chinese word segmentation F-measure, the number of bigrams in the model can be reduced by up to 90%. Correlation be- tween language model perplexity and word segmentation ...