Paper: Detection of Non-Native Sentences Using Machine-Translated Training Data

ACL ID N07-2024
Title Detection of Non-Native Sentences Using Machine-Translated Training Data
Venue Human Language Technologies
Session Short Paper
Year 2007
Authors

Training statistical models to detect non- native sentences requires a large corpus of non-native writing samples, which is often not readily available. This paper examines the extent to which machine- translated (MT) sentences can substitute as training data. Two tasks are examined. For the na- tive vs non-native classi cation task, non- native training data yields better perfor- mance; for the ranking task, however, models trained with a large, publicly avail- able set of MT data perform as well as those trained with non-native data.