Paper: Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information

ACL ID P12-3025
Title Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
Venue Annual Meeting of the Association of Computational Linguistics
Session System Demonstration
Year 2012
Authors

In this paper, we introduce a framework that identifies online plagiarism by exploiting lexical, syntactic and semantic features that includes duplication-gram, reordering and alignment of words, POS and phrase tags, and semantic similarity of sentences. We establish an ensemble framework to combine the predictions of each model. Results demonstrate that our system can not only find considerable amount of real-world online plagiarism cases but also outperforms several state-of-the-art algorithms and commercial software. Keywords Plagiarism Detection, Lexical, Syntactic, Semantic