Paper: Automatic Learning Of Language Model Structure

ACL ID C04-1022
Title Automatic Learning Of Language Model Structure
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2004

Statistical language modeling remains a challeng- ing task, in particular for morphologically rich lan- guages. Recently, new approaches based on factored language models have been developed to address this problem. These models provide principled ways of including additional conditioning variables other than the preceding words, such as morphological or syntactic features. However, the number of possible choices for model parameters creates a large space of models that cannot be searched exhaustively. This paper presents an entirely data-driven model selec- tion procedure based on genetic search, which is shown to outperform both knowledge-based and ran- dom selection procedures on two di erent language modeling tasks (Arabic and Turkish).