Paper: Extracting the Native Language Signal for Second Language Acquisition

ACL ID N13-1009
Title Extracting the Native Language Signal for Second Language Acquisition
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2013
Authors

We develop a method for effective extraction of linguistic patterns that are differentially ex- pressed based on the native language of the author. This method uses multiple corpora to allow for the removal of data set specific patterns, and addresses both feature relevancy and redundancy. We evaluate different rel- evancy ranking metrics and show that com- mon measures of relevancy can be inappro- priate for data with many rare features. Our feature set is a broad class of syntactic pat- terns, and to better capture the signal we ex- tend the Bayesian Tree Substitution Grammar induction algorithm to a supervised mixture of latent grammars. We show that this extension can be used to extract a larger set of relevant features.