Paper: Latent Class Transliteration based on Source Language Origin

ACL ID P11-2010
Title Latent Class Transliteration based on Source Language Origin
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

Transliteration, a rich source of proper noun spelling variations, is usually recognized by phonetic- or spelling-based models. How- ever, a single model cannot deal with dif- ferent words from different language origins, e.g., “get” in “piaget” and “target.” Li et al. (2007) propose a method which explicitly models and classifies the source language ori- gins and switches transliteration models ac- cordingly. This model, however, requires an explicitly tagged training set with language origins. We propose a novel method which models language origins as latent classes. The parameters are learned from a set of translit- erated word pairs via the EM algorithm. The experimental results of the transliteration task of Western names to Japanese show that the proposed model can achiev...