DOC-SHEILD Plagiarism Detector
Sign up for access to the world's latest research
Abstract
In the world of academia and profession, original thought and authenticity form the bedrock. With the rise of plagiarism detection, intellectual property is now protected. Traditional plagiarism detectors face the challenge of detecting paraphrased, translated, or contextually altered content. This paper would describe a proposed system in which NLP, deep learning techniques, and advanced linguistic analysis would be applied in order to enhance the accuracy and efficiency of plagiarism detection. The proposed system would then integrate context-aware algorithms along with semantic similarity assessment over the limitations that the traditional methods have to their advantages, which might potentially raise the educational integrity of institutions and the authenticity of published works.
Related papers
IAES International Journal of Artificial Intelligence (IJ-AI), 2021
Finding plagiarism strings between two given documents are the main task of the plagiarism detection problem. Traditional approaches based on string matching are not very useful in cases of similar semantic plagiarism. Deep learning approaches solve this problem by measuring the semantic similarity between pairs of sentences. However, these approaches still face the following challenging points. First, it is impossible to solve cases where only part of a sentence belongs to a plagiarism passage. Second, measuring the sentential similarity without considering the context of surrounding sentences leads to decreasing in accuracy. To solve the above problems, this paper proposes a two-phase plagiarism detection system based on multi-layer long short-term memory network model and feature extraction technique: (i) a passage-phase to recognize plagiarism passages, and (ii) a word-phase to determine the exact plagiarism strings. Our experiment results on PAN 2014 corpus reached 94.26% F-mea...
2020
Plagiarism is one of the major aspects that is considered when it comes to academics, literature as well as other fields where it is necessary to check if an idea is original. Plagiarism, when simply put, means the act of copying someone’s work and portraying it as your own. It is ethically incorrect and is considered as a crime. For the purpose of finding plagiarism, many tools are available which can be downloaded or can be directly used online. These tools check the similarity at lexical and sentence level only. Hence, they only do statistical comparison whether the sentence is plagiarised or not, and not whether the idea is plagiarised. This project deals with detecting plagiarism at semantic level as well as identifying paraphrases, and ignoring the Named Entities which add to unnecessary plagiarism percentages. For the purpose of achieving this, we use Latent Semantic Analysis and a Bidirectional LSTM model for paraphrase detection. The final plagiarism uses a neural network t...
JOURNAL OF EDUCATION AND SCIENCE, 2022
The Web provides various kinds of data and applications that are readily available to explore and are considered a powerful tool for humans. Copyright violation in web documents occurs when there is an unauthorized copy of the information or text from the original document on the web; this violation is known as Plagiarism. Plagiarism Detection (PD)can be defined as the procedure that finds similarities between a document and other documents based on lexical, semantic, and syntactic textual features. The approaches for numeric representation (vectorization) of text like Vector Space Model (VSM) and word embedding along with text similarity measures such as cosine and jaccard are very necessary for plagiarism detection. This paper deals with the concepts of plagiarism, kinds of plagiarism, textual features, text similarity measures, and plagiarism detection methods, which are based on intelligent or traditional techniques. Furthermore, different types of traditional and algorithms of deep learning for instance, Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) are discussed as a plagiarism detector. Besides that, this work reviews many other papers that give attention to the topic of Plagiarism and its detection.
International Journal of Computer and Information System (IJCIS), 2024
Presently available plagiarism detection technologies are primarily restricted to string-level comparisons between potentially original texts and suspiciously plagiarized materials. The objective of this research is to enhance the precision of plagiarism identification by integrating Natural Language Processing (NLP) methods into current methodologies. Our proposal is an external plagiarism detection framework that uses various natural language processing (NLP) approaches to examine a set of original and suspicious papers. The techniques not only analyze text strings but also the text's structure, taking text relations into consideration. Preliminary findings using a corpus of short paragraphs that have been plagiarized demonstrate that NLP approach increase the correctness of current methods.
International Journal of Advanced Computer Science and Applications, 2022
Effective detection has been extremely difficult due to plagiarism's pervasiveness throughout a variety of fields, including academia and research. Increasingly complex plagiarism detection strategies are being used by people, making traditional approaches ineffective. The assessment of plagiarism involves a comprehensive examination encompassing syntactic, lexical, semantic, and structural facets. In contrast to traditional string-matching techniques, this investigation adopts a sophisticated Natural Language Processing (NLP) framework. The preprocessing phase entails a series of intricate steps ultimately refining the raw text data. The crux of this methodology lies in the integration of two distinct metrics within the Encoder Representation from Transformers (E-BERT) approach, effectively facilitating a granular exploration of textual similarity. Within the realm of NLP, the amalgamation of Deep and Shallow approaches serves as a lens to delve into the intricate nuances of the text, uncovering underlying layers of meaning. The discerning outcomes of this research unveil the remarkable proficiency of Deep NLP in promptly identifying substantial revisions. Integral to this innovation is the novel utilization of the Waterman algorithm and an English-Spanish dictionary, which contribute to the selection of optimal attributes. Comparative evaluations against alternative models employing distinct encoding methodologies, along with logistic regression as a classifier underscore the potency of the proposed implementation. The culmination of extensive experimentation substantiates the system's prowess, boasting an impressive 99.5% accuracy rate in extracting instances of plagiarism. This research serves as a pivotal advancement in the domain of plagiarism detection, ushering in effective and sophisticated methods to combat the growing spectre of unoriginal content.
International Journal of Interactive Mobile Technologies (iJIM)
Academic plagiarism has become a serious concern as it leads to the retardation of scientific progress and violation of intellectual property. In this context, we make a study aiming at the detection of cross-linguistic plagiarism based on Natural language Preprocessing (NLP), Embedding Techniques, and Deep Learning. Many systems have been developed to tackle this problem, and many rely on machine learning and deep learning methods. In this paper, we propose Cross-language Plagiarism Detection (CL-PD) method based on Doc2Vec embedding techniques and a Siamese Long Short-Term Memory (SLSTM) model. Embedding techniques help capture the text's contextual meaning and improve the CL-PD system's performance. To show the effectiveness of our method, we conducted a comparative study with other techniques such as GloVe, FastText, BERT, and Sen2Vec on a dataset combining PAN11, JRC-Acquis, Europarl, and Wikipedia. The experiments for the Spanish-English language pair show that Doc2Vec...
International Conference on Expert Clouds and Applications (ICOECA), 2024
Detecting strongly paraphrased and translated texts is challenging for existing detection tools as they rely on traditional approach of word searching and matching. Automated systems play a crucial role in identifying instances of plagiarism, thereby upholding the integrity of intellectual work. The study presents a system that detects plagiarism in paraphrased texts using Natural Language Processing Techniques through Word Embedding Techniques, specifically Word2Vec and Bidirectional Encoder Representations from Transformers (BERT). By combining different approaches and techniques, the results showed that the hybrid model achieved 93% accuracy in detecting paraphrased plagiarism. After the integration of the model to the system, the evaluation using the ISO 25010 accumulated excellent
Expert Systems with Applications, 2015
Plagiarism is described as the reuse of someone else's previous ideas, work or even words without sufficient attribution to the source. This paper presents a method to detect external plagiarism using the integration of semantic relations between words and their syntactic composition. The problem with the available methods is that they fail to capture the meaning in comparison between a source document sentence and a suspicious document sentence, when two sentences have same surface text (the words are the same) or they are a paraphrase of each other. Therefore it causes inaccurate or unnecessary matching results. However, this method can improve the performance of plagiarism detection because it is able to avoid selecting the source text sentence whose similarity with suspicious text sentence is high but its meaning is different. It is executed by computing the semantic and syntactic similarity of the sentence-to-sentence. Besides, the proposed method expands the words in sentences to tackle the problem of information limit. It bridges the lexical gaps for semantically similar contexts that are expressed in a different wording. This method is also capable to identify various kinds of plagiarism such as the exact copied text, paraphrasing, transformation of sentences and changing of word structure in the sentences. As a result, the experimental results have displayed that the proposed method is able to improve the performance compared with the participating systems in PAN-PC-11. The experimental results also displayed that the proposed method demonstrates better performance as compared to other existing techniques on PAN-PC-10 and PAN-PC-11 datasets.
IJARIIE, 2023
Plagiarism is the act of copying another in recent years, plagiarism detection across languages has received particular attention; in this section, we will review various research from 2015 to the present. A study applied a new concept-based measure for weighing concepts in their graph representations, along with knowledge graph analysis. Their research centered on the utilization of several aspects from multilingual graphs, including word sense disambiguation, vocabulary expansion, and representation by similarities with a group of concepts. In addition to knowledge graph representation and plagiarism detection, an extension was made using continuous space representation (i.e., word embedding) and alignment similarity analysis. Another study combined semantic relatedness metrics from WordNet's knowledge networks between words and ideas to assess how comparable phrases person's work without giving that person or source credit or a source citation. Plagiarism detectors are made to detect this scam. Platforms powered by AI can offer a variety of techniques for making content readable, impactful, and grammatically sound. The primary justification for plagiarism is that it is the simplest and quickest approach to complete literary projects such as academic papers, articles, or essays. Sometimes students find it difficult to handle the workload and project deadlines, therefore they prefer to plagiarize rather than produce unique work.
G. Tsatsaronis, I. Varlamis, A. Giannakoulopoulos and N. Kanellopoulos (2010) "Identifying free text plagiarism based on semantic similarity", in Proceedings of the 4th International Plagiarism Conference (IPC 2010), 21-23 June, 2010, Newcastle, U.K., 2010
It is common knowledge that plagiarism in academia goes as back in time as research itself. However, in the last two decades this phenomenon of academic deception has turned into an academic plague. Undoubtedly, the rapid expansion of the Web and the vast amount of publicly available information and documents facilitate the unethical malpractice of computer-aided plagiarism, which in turn has inflated the problem. Anti-plagiarism techniques build upon technological solutions and especially the development of task-specific software. The role of anti-plagiarism software for text is to process a document and identify the pieces of text that have been reproduced from another source. This work presents a semantic-based approach to text-plagiarism detection which improves the efficiency of traditional keyword matching techniques. Our semantic matching technique is able to detect a larger variety of paraphrases, including the use of synonym terms, the repositioning of words in the sentences etc. We evaluate our methodology in a dataset comprising positive and negative plagiarism samples and present comparative results of both supervised and unsupervised methods.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (4)
- References
- Dong, Y., et al. "A Semantic-Based Plagiarism Detection Approach Using Word Embeddings." *Journal of Educational Computing Research*, vol. 57, no. 1, 2019.
- Soni, S., and Roberts, R. "Deep Learning for Paraphrase Detection." *Proceedings of the 12th ACM Conference on Text Mining*, 2021.
- Leacock, C., Chodorow, M., and Gamon, M. "Contextual Similarity in Plagiarism Detection: Improving Accuracy through Deep Learning." *Natural Language Engineering*, vol. 28, no. 2, 2022.