Paper: A Statistical Model for Unsupervised and Semi-supervised Transliteration Mining

ACL ID P12-1049
Title A Statistical Model for Unsupervised and Semi-supervised Transliteration Mining
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2012
Authors

We propose a novel model to automatically extract transliteration pairs from parallel cor- pora. Our model is efficient, language pair independent and mines transliteration pairs in a consistent fashion in both unsupervised and semi-supervised settings. We model transliter- ation mining as an interpolation of translitera- tion and non-transliteration sub-models. We evaluate on NEWS 2010 shared task data and on parallel corpora with competitive results.