Paper: A Comparative Investigation of Morphological Language Modeling for the Languages of the European Union

ACL ID N12-1043
Title A Comparative Investigation of Morphological Language Modeling for the Languages of the European Union
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2012
Authors

We investigate a language model that com- bines morphological and shape features with a Kneser-Ney model and test it in a large crosslingual study of European languages. Even though the model is generic and we use the same architecture and features for all lan- guages, the model achieves reductions in per- plexity for all 21 languages represented in the Europarl corpus, ranging from 3% to 11%. We show that almost all of this perplexity reduc- tion can be achieved by identifying suffixes by frequency.