Paper: Combination Of Arabic Preprocessing Schemes For Statistical Machine Translation

ACL ID P06-1001
Title Combination Of Arabic Preprocessing Schemes For Statistical Machine Translation
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2006
Authors

Statistical machine translation is quite ro- bust when it comes to the choice of in- put representation. It only requires con- sistency between training and testing. As a result, there is a wide range of possi- ble preprocessing choices for data used in statistical machine translation. This is even more so for morphologically rich languages such as Arabic. In this paper, we study the effect of different word-level preprocessing schemes for Arabic on the quality of phrase-based statistical machine translation. We also present and evalu- ate different methods for combining pre- processing schemes resulting in improved translation quality.