Paper: Arabizi Detection and Conversion to Arabic

ACL ID W14-3629
Title Arabizi Detection and Conversion to Arabic
Venue Workshop on Arabic Natural Language Processing
Year 2014

Arabizi is Arabic text that is written using Latin characters. Arabizi is used to present both Mod- ern Standard Arabic (MSA) or Arabic dialects. It is commonly used in informal settings such as so- cial networking sites and is often with mixed with English. In this paper we address the problems of: identifying Arabizi in text and converting it to Ara- bic characters. We used word and sequence-level features to identify Arabizi that is mixed with En- glish. We achieved an identification accuracy of 98.5%. As for conversion, we used transliteration mining with language modeling to generate equiva- lent Arabic text. We achieved 88.7% conversion ac- curacy, with roughly a third of errors being spelling and morphological variants of the forms in ground truth.