Hybrid System for Plagiarism Detection on A Scientific Paper
2021
Abstract
Plagiarism Detection Systems are critical in identifying instances of plagiarism, particularly in the educational sector whenever it comes to scientific publications and papers. Plagiarism occurs when any material is copied without the author's consent or attribution. To identify such acts, thorough knowledge of plagiarism types and classes is required. It is feasible to detect several sorts of plagiarism using current tools and methodologies. With the advancement of information and communication technologies (ICT) and the availability of online scientific publications, access to these publications has grown more convenient. Additionally, with the availability of several software text editors, plagiarism detection has become a crucial concern. Numerous scholarly articles have previously examined plagiarism detection and the two most often used datasets for plagiarism detection, WordNet and the PAN Dataset. The researchers described verbatim plagiarism detection as a straightforward case of copying and pasting, and then shed light on clever plagiarism, which is more difficult to detect since it may involve original text alteration, borrowing ideas from other studies, and Other scholars have said that plagiarism can obscure the scientific content by substituting terms, deleting or introducing material, rearranging or changing the original publications. The suggested system incorporated natural language processing (NLP) and machine learning (ML) techniques, as well as an external plagiarism detection strategy based on text mining and similarity analysis. The suggested technique employs a mix of Jaccard and cosine similarity. It was examined using the PAN-PC-11 corpus. The proposed system outperforms previous systems on the PAN-PC-11, as demonstrated by the findings. Additionally, the proposed system obtains an accuracy of 0.96, a recall of 0.86, an F-measure of 0.86, and a PlagDet score of 0.86. (0.86). 0.865 and the proposed technique is substantiated by a design application that is used to detect plagiarism in scientific publications and generate nonmedication notifications. Portable Document Format (PDF) .
References (24)
- Miguel R. D. Ph ,(2015). "Avoiding plagiarism, self-plagiarism, and other questionable writing practices: A guide to ethical writing,". Office of Research Integrity (ORI), pp. 1-71.
- Miguel R. D. Ph,(2006). "Avoiding plagiarism , self-plagiarism , and other questionable writing practices : A guide to ethical writing,". Office of Research Integrity (ORI), pp. 1-63.
- Asif E, Sriparna S, Gaurav C. (2012). "Plagiarism detection in text using Vector Space Model". Proceedings of the 2012 12th International Conference on Hybrid Intelligent Systems, HIS 2012. 978-1- 4673-5116-4
- Durrga B,Venu G,(2014)."UNDERSTANDING PLAGIARISM FOR CONTEXTUAL FEATURES Abstr". International Journal of Software &Hardware Researvhe in Engineering pp. 24-27, 2014.
- Mathieu F, Michael R ,(2008)."A comparison of common programming languages used in bioinformatics" .BMC Bioinformatics, vol. 9, pp. 1-9.
- xie R,(2018). "an overview of plagiarism recognition techniques". international journal of knowledge and and language processingc , volume 9, number 2, 2018, 2191-2734
- Hussain C, Dhruba B,(2018) ." plagiarism: taxonomy, tools and detection techniques". arXiv , ISBN: 978-93-82735-08-3.
- Parth G, Khushboo S, Prasenjit M, Paolo R, (2011)."Detection of Paraphrastic Cases of Mono-lingual and Cross-lingual Plagiarism". IR-Lab,DA-IICT,India .
- Mayank A , Dilip S (2016). "A state of art on source code plagiarism detection. ". 2016 2nd International Conference on Next Generation Computing Technologies (NGCT), 978-1-5090-3257-0.
- Fili B, Adrian S, Traian R, and Razvan R, (2013)."Automatic plagiarism detection system for specialized corpora" . Proc. -19th Int. Conf. Control Syst. Comput. Sci. CSCS 2013, no. June, pp. 77-82,
- Mansi S, Vishal G,(2017) ."A Novel Technique for Detecting Plagiarism in Documents Exploiting Information Sources" . Cognitive Computation., vol. 9, no. 6, pp. 852-867, 2017.
- Asad A, Norisma I, Rasim A, Ramiz A,(2015) ."PDLK: Plagiarism detection using linguistic knowledge". Expert Systems with Applications. Appl., vol. 42, no. 22, pp. 8936-8946, 2015
- Asad A, Siti S, Norisma I, RasiM A ,(2017) ."A linguistic treatment for automatic external plagiarism detection". Knowledge-Based Syst., vol. 135, no. November, pp. 135-146.
- Lovepreet A, Vishal G, Rohit K, (2020)."A New Hybrid Technique for Detection of Plagiarism from Text Documents" .Arab. J. Sci. Eng., vol. 45, no. 12, pp. 9939-9952, 2020.
- Niwattanakul, S., Singthongchai, J., Naenudorn, E., & Wanapu, S. (2013, March). "Using of Jaccard coefficient for keywords similarity". In Proceedings of the international multiconference of engineers and computer scientists (Vol. 1, No. 6, pp. 380-384).
- Lisna Z,(2016)."Comparison Jaccard similarity, Cosine Similarity and Combined Both of the Data Clustering With Shared Nearest Neighbor Method" . Comput. Eng. Appl. J., vol. 5, no. 1, pp. 11-18.
- Elavarasi, S. A., Akilandeswari, J., & Menaga, K. (2014). "A survey on semantic similarity measure".International Journal of Research in Advent Technology, 2(3), 389-398.
- Pang N (2011). Introduction to data mining.doi:10.1007/978.3-642-197721-5-1.
- Shiliang S, Chen, J, Junyu C, (2017). "A review of natural language processing techniques for opinion mining systems". Information Fusion, 36, 10-25
- Gorunescu, F. (2011). Introduction to Data Mining. Data Mining, 1-43
- SALHA. A,(2012). "Structural Information and Fuzzy Semantic Similarity".Universiti Teknologi Malaysia.
- Salha A , Naomie S, Vasile P, (2015). "Uncovering highly obfuscated plagiarism cases using fuzzy semantic-based similarity model". J. King Saud Univ. -Comput. Inf. Sci., vol. 27, no. 3, pp. 248-268.
- Trevor C, Chris B, Mirella L(2008) ."Constructing corpora for the development and evaluation of paraphrase systems". Comput. Linguist., vol. 34, no. 4, pp. 597-614.
- Stamatatos, E. (2011). " Plagiarism detection using stopword n-grams". Journal of the American Society for Information Science and Technology, 62(12), 2512-2527.