Paper: A Modified Joint Source-Channel Model For Transliteration

ACL ID P06-2025
Title A Modified Joint Source-Channel Model For Transliteration
Venue Annual Meeting of the Association of Computational Linguistics
Session Poster Session
Year 2006

Most machine transliteration systems transliterate out of vocabulary (OOV) words through intermediate phonemic mapping. A framework has been presented that allows direct orthographical mapping between two languages that are of different origins employing different alphabet sets. A modified joint source–channel model along with a number of alternatives have been proposed. Aligned transliteration units along with their context are automatically derived from a bilingual training corpus to generate the collocational statistics. The transliteration units in Bengali words take the pattern C+M where C represents a vowel or a consonant or a conjunct and M represents the vowel modifier or matra. The English transliteration units are of the form C*V* where C represents a consonant and V represents...