Paper: Machine Translation of Arabic Dialects

ACL ID N12-1006
Title Machine Translation of Arabic Dialects
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2012
Authors

Arabic Dialects present many challenges for machine translation, not least of which is the lack of data resources. We use crowdsourc- ing to cheaply and quickly build Levantine- English and Egyptian-English parallel cor- pora, consisting of 1.1M words and 380k words, respectively. The dialectal sentences are selected from a large corpus of Arabic web text, and translated using Amazon?s Mechan- ical Turk. We use this data to build Dialec- tal Arabic MT systems, and find that small amounts of dialectal data have a dramatic im- pact on translation quality. When translating Egyptian and Levantine test sets, our Dialec- tal Arabic MT system performs 6.3 and 7.0 BLEU points higher than a Modern Standard Arabic MT system trained on a 150M-word Arabic-English parallel corpus.