Paper: A Log-Linear Block Transliteration Model based on Bi-Stream HMMs

ACL ID N07-1046
Title A Log-Linear Block Transliteration Model based on Bi-Stream HMMs
Venue Human Language Technologies
Session Main Conference
Year 2007
Authors

We propose a novel HMM-based framework to accurately transliterate unseen named entities. The framework leverages features in letter- alignment and letter n-gram pairs learned from available bilingual dictionaries. Letter-classes, such as vowels/non-vowels, are integrated to further improve transliteration accuracy. The proposed transliteration system is applied to out-of-vocabulary named-entities in statistical machine translation (SMT), and a significant improvement over traditional transliteration ap- proach is obtained. Furthermore, by incor- porating an automatic spell-checker based on statistics collected from web search engines, transliteration accuracy is further improved. The proposed system is implemented within our SMT system and applied to a real transla- tion scenario from Ara...