Paper: Language of Vandalism: Improving Wikipedia Vandalism Detection via Stylometric Analysis

ACL ID P11-2015
Title Language of Vandalism: Improving Wikipedia Vandalism Detection via Stylometric Analysis
Venue Annual Meeting of the Association of Computational Linguistics
Session Main Conference
Year 2011
Authors

Community-based knowledge forums, such as Wikipedia, are susceptible to vandalism, i.e., ill-intentioned contributions that are detrimen- tal to the quality of collective intelligence. Most previous work to date relies on shallow lexico-syntactic patterns and metadata to au- tomatically detect vandalism in Wikipedia. In this paper, we explore more linguistically mo- tivated approaches to vandalism detection. In particular, we hypothesize that textual vandal- ism constitutes a unique genre where a group of people share a similar linguistic behav- ior. Experimental results suggest that (1) sta- tistical models give evidence to unique lan- guage styles in vandalism, and that (2) deep syntactic patterns based on probabilistic con- text free grammars (PCFG) discriminate van- dalism more effecti...