Paper: Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English

ACL ID W13-1703
Title Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English
Venue Innovative Use of NLP for Building Educational Applications
Session
Year 2013
Authors

We describe the NUS Corpus of Learner En- glish (NUCLE), a large, fully annotated cor- pus of learner English that is freely available for research purposes. The goal of the cor- pus is to provide a large data resource for the development and evaluation of grammatical error correction systems. Although NUCLE has been available for almost two years, there has been no reference paper that describes the corpus in detail. In this paper, we address this need. We describe the annotation schema and the data collection and annotation process of NUCLE. Most importantly, we report on an unpublished study of annotator agreement for grammatical error correction. Finally, we present statistics on the distribution of gram- matical errors in the NUCLE corpus.