Paper: Universal Grapheme-to-Phoneme Prediction Over Latin Alphabets

ACL ID D12-1031
Title Universal Grapheme-to-Phoneme Prediction Over Latin Alphabets
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2012

We consider the problem of inducing grapheme-to-phoneme mappings for un- known languages written in a Latin alphabet. First, we collect a data-set of 107 languages with known grapheme-phoneme relationships, along with a short text in each language. We then cast our task in the framework of super- vised learning, where each known language serves as a training example, and predictions are made on unknown languages. We induce an undirected graphical model that learns phonotactic regularities, thus relating textual patterns to plausible phonemic interpretations across the entire range of languages. Our model correctly predicts grapheme-phoneme pairs with over 88% F1-measure.