Paper: Detecting Errors Within A Corpus Using Anomaly Detection

ACL ID A00-2020
Title Detecting Errors Within A Corpus Using Anomaly Detection
Venue Annual Conference of the North American Chapter of the Association for Computational Linguistics
Session Main Conference
Year 2000
Authors

We present a method for automatically detect- ing errors in a manually marked corpus us- ing anomaly detection. Anomaly detection is a method for determining which elements of a large data set do not conform to the whole. This method fits a probability distribution over the data and applies a statistical test to detect anomalous elements. In the corpus error detec- tion problem, anomalous elements are typically marking errors. We present the results of ap- plying this method to the tagged portion of the Penn Treebank corpus.