Paper: Automatically Classifying Edit Categories in Wikipedia Revisions

ACL ID D13-1055
Title Automatically Classifying Edit Categories in Wikipedia Revisions
Venue Conference on Empirical Methods in Natural Language Processing
Session Main Conference
Year 2013

In this paper, we analyze a novel set of fea- tures for the task of automatic edit category classification. Edit category classification as- signs categories such as spelling error correc- tion, paraphrase or vandalism to edits in a doc- ument. Our features are based on differences between two versions of a document includ- ing meta data, textual and language properties and markup. In a supervised machine learning experiment, we achieve a micro-averaged F1 score of .62 on a corpus of edits from the En- glish Wikipedia. In this corpus, each edit has been multi-labeled according to a 21-category taxonomy. A model trained on the same data achieves state-of-the-art performance on the related task of fluency edit classification. We apply pattern mining to automatically la- beled edits in the re...