Paper: A Succinct N-gram Language Model

ACL ID P09-2086
Title A Succinct N-gram Language Model
Venue Annual Meeting of the Association of Computational Linguistics
Session Short Paper
Year 2009

Efficient processing of tera-scale text data is an important research topic. This pa- per proposes lossless compression of N- gram language models based on LOUDS, a succinct data structure. LOUDS suc- cinctly represents a trie with M nodes as a 2M +1bit string. We compress it further for the N-gram language model structure. We also use ‘variable length coding’and ‘block-wise compression’ to compress val- ues associated with nodes. Experimental results for three large-scale N-gram com- pression tasks achieved a significant com- pression rate without any loss.