Paper: Transforming Standard Arabic to Colloquial Arabic

ACL ID P12-2035
Title Transforming Standard Arabic to Colloquial Arabic
Venue Annual Meeting of the Association of Computational Linguistics
Session Short Paper
Year 2012

We present a method for generating Colloquial Egyptian Arabic (CEA) from morphologically dis- ambiguated Modern Standard Arabic (MSA). When used in POS tagging, this process improves the accuracy from 73.24% to 86.84% on unseen CEA text, and reduces the percentage of out-of- vocabulary words from 28.98% to 16.66%. The process holds promise for any NLP task targeting the dialectal varieties of Arabic; e.g., this approach may provide a cheap way to leverage MSA data and morphological resources to create resources for colloquial Arabic to English machine transla- tion. It can also considerably speed up the annota- tion of Arabic dialects.