Paper: Exploring Adaptor Grammars for Native Language Identification

ACL ID D12-1064
Title Exploring Adaptor Grammars for Native Language Identification
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2012

The task of inferring the native language of an author based on texts written in a second language has generally been tackled as a clas- sification problem, typically using as features a mix of n-grams over characters and part of speech tags (for small and fixed n) and un- igram function words. To capture arbitrar- ily long n-grams that syntax-based approaches have suggested are useful, adaptor grammars have some promise. In this work we investi- gate their extension to identifying n-gram col- locations of arbitrary length over a mix of PoS tags and words, using both maxent and in- duced syntactic language model approaches to classification. After presenting a new, simple baseline, we show that learned collocations used as features in a maxent model perform better still, but that the story...