Paper: Arabic Native Language Identification

ACL ID W14-3625
Title Arabic Native Language Identification
Venue Workshop on Arabic Natural Language Processing
Year 2014

In this paper we present the first appli- cation of Native Language Identification (NLI) to Arabic learner data. NLI, the task of predicting a writer?s first language from their writing in other languages has been mostly investigated with English data, but is now expanding to other languages. We use L2 texts from the newly released Ara- bic Learner Corpus and with a combina- tion of three syntactic features (CFG pro- duction rules, Arabic function words and Part-of-Speech n-grams), we demonstrate that they are useful for this task. Our sys- tem achieves an accuracy of 41% against a baseline of 23%, providing the first evi- dence for classifier-based detection of lan- guage transfer effects in L2 Arabic. Such methods can be useful for studying lan- guage transfer, developing teaching mate- ...