Paper: Knowledge-Rich Morphological Priors for Bayesian Language Models

ACL ID N13-1140
Title Knowledge-Rich Morphological Priors for Bayesian Language Models
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2013
Authors

We present a morphology-aware nonparamet- ric Bayesian model of language whose prior distribution uses manually constructed finite- state transducers to capture the word forma- tion processes of particular languages. This relaxes the word independence assumption and enables sharing of statistical strength across, for example, stems or inflectional paradigms in different contexts. Our model can be used in virtually any scenario where multinomial distributions over words would be used. We obtain state-of-the-art results in language modeling, word alignment, and un- supervised morphological disambiguation for a variety of morphologically rich languages.