Paper: User Edits Classification Using Document Revision Histories

ACL ID E12-1036
Title User Edits Classification Using Document Revision Histories
Venue Annual Meeting of The European Chapter of The Association of Computational Linguistics
Session Main Conference
Year 2012

Document revision histories are a useful and abundant source of data for natural language processing, but selecting relevant data for the task at hand is not trivial. In this paper we introduce a scalable ap- proach for automatically distinguishing be- tween factual and fluency edits in document revision histories. The approach is based on supervised machine learning using lan- guage model probabilities, string similar- ity measured over different representations of user edits, comparison of part-of-speech tags and named entities, and a set of adap- tive features extracted from large amounts of unlabeled user edits. Applied to con- tiguous edit segments, our method achieves statistically significant improvements over a simple yet effective edit-distance base- line. It reaches high classifi...