Paper: An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment

ACL ID P11-1044
Title An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

We propose a language-independent method for the automatic extraction of transliteration pairs from parallel corpora. In contrast to previous work, our method uses no form of supervision, and does not require linguisti- cally informed preprocessing. We conduct experiments on data sets from the NEWS 2010 shared task on transliteration mining and achieve an F-measure of up to 92%, out- performing most of the semi-supervised sys- tems that were submitted. We also apply our method to English/Hindi and English/Arabic parallel corpora and compare the results with manually built gold standards which mark transliterated word pairs. Finally, we integrate the transliteration module into the GIZA++ word aligner and evaluate it on two word alignment tasks achieving improvements in both precision and r...