Paper: Language Model Rest Costs and Space-Efficient Storage

ACL ID D12-1107
Title Language Model Rest Costs and Space-Efficient Storage
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2012
Authors

Approximate search algorithms, such as cube pruning in syntactic machine translation, rely on the language model to estimate probabili- ties of sentence fragments. We contribute two changes that trade between accuracy of these estimates and memory, holding sentence-level scores constant. Common practice uses lower- order entries in an N -gram model to score the first few words of a fragment; this vio- lates assumptions made by common smooth- ing strategies, including Kneser-Ney. Instead, we use a unigram model to score the first word, a bigram for the second, etc. This im- proves search at the expense of memory. Con- versely, we show how to save memory by col- lapsing probability and backoff into a single value without changing sentence-level scores, at the expense of less accurate estimat...