Paper: An improved MDL-based compression algorithm for unsupervised word segmentation

ACL ID P13-2030
Title An improved MDL-based compression algorithm for unsupervised word segmentation
Venue Annual Meeting of the Association of Computational Linguistics
Session Short Paper
Year 2013
Authors

We study the mathematical properties of a recently proposed MDL-based unsuper- vised word segmentation algorithm, called regularized compression. Our analysis shows that its objective function can be efficiently approximated using the nega- tive empirical pointwise mutual informa- tion. The proposed extension improves the baseline performance in both efficiency and accuracy on a standard benchmark.