ACL Anthology Network (All About NLP) (beta) The Association Of Computational Linguistics Anthology Network |
ACL ID | P11-2005 |
---|---|
Title | An Empirical Investigation of Discounting in Cross-Domain Language Models |
Venue | Annual Meeting of the Association of Computational Linguistics |
Session | Main Conference |
Year | 2011 |
Authors |
We investigate the empirical behavior of n- gram discounts within and across domains. When a language model is trained and evalu- ated on two corpora from exactly the same do- main, discounts are roughly constant, match- ing the assumptions of modified Kneser-Ney LMs. However, when training and test corpora diverge, the empirical discount grows essen- tially as a linear function of the n-gram count. We adapt a Kneser-Ney language model to incorporate such growing discounts, result- ing in perplexity improvements over modified Kneser-Ney and Jelinek-Mercer baselines.