Paper: Confusion Network for Arabic Name Disambiguation and Transliteration in Statistical Machine Translation

ACL ID C14-1042
Title Confusion Network for Arabic Name Disambiguation and Transliteration in Statistical Machine Translation
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2014
Authors

Arabic words are often ambiguous between name and non-name interpretations, frequently leading to incorrect name translations. We present a technique to disambiguate and transliter- ate names even if name interpretations do not exist or have relatively low probability distribu- tions in the parallel training corpus. The key idea comprises named entity classing at the pre- processing step, decoding of a simple confusion network created from the name class label and the input word at the statistical machine translation step, and transliteration of names at the post-processing step. Human evaluations indicate that the proposed technique leads to a statis- tically significant translation quality improvement of highly ambiguous evaluation data sets without degrading the translation quality of a...