Paper: An Efficient Language Model Using Double-Array Structures

ACL ID D13-1023
Title An Efficient Language Model Using Double-Array Structures
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2013

Ngram language models tend to increase in size with inflating the corpus size, and con- sume considerable resources. In this pa- per, we propose an efficient method for im- plementing ngram models based on double- array structures. First, we propose a method for representing backwards suffix trees using double-array structures and demonstrate its ef- ficiency. Next, we propose two optimization methods for improving the efficiency of data representation in the double-array structures. Embedding probabilities into unused spaces in double-array structures reduces the model size. Moreover, tuning the word IDs in the language model makes the model smaller and faster. We also show that our method can be used for building large language models using the division method. Lastly, we show that our m...