Paper: Detection of Simple Plagiarism in Computer Science Papers

ACL ID C10-1048
Venue International Conference on Computational Linguistics
Session Main Conference
Year 2010

Plagiarism is the use of the language and thoughts of another work and the repre- sentation of them as one's own original work. Various levels of plagiarism exist in many domains in general and in aca- demic papers in particular. Therefore, di- verse efforts are taken to automatically identify plagiarism. In this research, we developed software capable of simple plagiarism detection. We have built a corpus (C) containing 10,100 academic papers in computer science written in English and two test sets including pa- pers that were randomly chosen from C. A widespread variety of baseline me- thods has been developed to identify identical or similar papers. Several me- thods are novel. The experimental results and their analysis show interesting find- ings. Some of the novel method...