Paper: Compressing Trigram Language Models With Golomb Coding

ACL ID D07-1021
Title Compressing Trigram Language Models With Golomb Coding
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2007
Authors

Trigram language models are compressed using a Golomb coding method inspired by the original Unix spell program. Compression methods trade off space, time and accuracy (loss). The proposed HashTBO method optimizes space at the expense of time and accuracy. Trigram language models are normally considered memory hogs, but with HashTBO, it is possible to squeeze a trigram language model into a few megabytes or less. HashTBO made it possible to ship a trigram contextual speller in Microsoft Office 2007.