Paper: Processing Spontaneous Orthography

ACL ID N13-1066
Title Processing Spontaneous Orthography
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2013

In cases in which there is no standard or- thography for a language or language vari- ant, written texts will display a variety of or- thographic choices. This is problematic for natural language processing (NLP) because it creates spurious data sparseness. We study the transformation of spontaneously spelled Egyptian Arabic into a conventionalized or- thography which we have previously proposed for NLP purposes. We show that a two-stage process can reduce divergences from this stan- dard by 69%, making subsequent processing of Egyptian Arabic easier.