Paper: A Generalized Language Model as the Combination of Skipped n-grams and Modified Kneser Ney Smoothing

ACL ID P14-1108
Title A Generalized Language Model as the Combination of Skipped n-grams and Modified Kneser Ney Smoothing
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2014
Authors

We introduce a novel approach for build- ing language models based on a system- atic, recursive exploration of skip n-gram models which are interpolated using modi- fied Kneser-Ney smoothing. Our approach generalizes language models as it contains the classical interpolation with lower or- der models as a special case. In this pa- per we motivate, formalize and present our approach. In an extensive empirical experiment over English text corpora we demonstrate that our generalized language models lead to a substantial reduction of perplexity between 3.1% and 12.7% in comparison to traditional language mod- els using modified Kneser-Ney smoothing. Furthermore, we investigate the behaviour over three other languages and a domain specific corpus where we observed consis- tent improvements. Fin...