Paper: Smoothed marginal distribution constraints for language modeling

ACL ID P13-1005
Title Smoothed marginal distribution constraints for language modeling
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2013

We present an algorithm for re-estimating parameters of backoff n-gram language models so as to preserve given marginal distributions, along the lines of well- known Kneser-Ney (1995) smoothing. Unlike Kneser-Ney, our approach is de- signed to be applied to any given smoothed backoff model, including models that have already been heavily pruned. As a result, the algorithm avoids issues observed when pruning Kneser-Ney models (Siivola et al., 2007; Chelba et al., 2010), while retain- ing the benefits of such marginal distribu- tion constraints. We present experimen- tal results for heavily pruned backoff n- gram models, and demonstrate perplexity and word error rate reductions when used with various baseline smoothing methods. An open-source version of the algorithm has been released as par...