Paper: Combining Shallow and Linguistically Motivated Features in Native Language Identification

ACL ID W13-1726
Title Combining Shallow and Linguistically Motivated Features in Native Language Identification
Venue Innovative Use of NLP for Building Educational Applications
Session
Year 2013
Authors

We explore a range of features and ensembles for the task of Native Language Identification as part of the NLI Shared Task (Tetreault et al., 2013). Starting with recurring word-based n- grams (Bykh and Meurers, 2012), we tested different linguistic abstractions such as part- of-speech, dependencies, and syntactic trees as features for NLI. We also experimented with features encoding morphological proper- ties, the nature of the realizations of particu- lar lemmas, and several measures of complex- ity developed for proficiency and readabil- ity classification (Vajjala and Meurers, 2012). Employing an ensemble classifier incorporat- ing all of our features we achieved an ac- curacy of 82.2% (rank 5) in the closed task and 83.5% (rank 1) in the open-2 task. In the open-1 task, the word-based...