Paper: Detection of Simple Plagiarism in Computer Science Papers

ACL ID C10-1048
Title Detection of Simple Plagiarism in Computer Science Papers
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2010

Plagiarism is the use of the language and thoughts of another work and the repre- sentation of them as one's own original work. Various levels of plagiarism exist in many domains in general and in aca- demic papers in particular. Therefore, di- verse efforts are taken to automatically identify plagiarism. In this research, we developed software capable of simple plagiarism detection. We have built a corpus (C) containing 10,100 academic papers in computer science written in English and two test sets including pa- pers that were randomly chosen from C. A widespread variety of baseline me- thods has been developed to identify identical or similar papers. Several me- thods are novel. The experimental results and their analysis show interesting find- ings. Some of the novel method...