Academia.eduAcademia.edu

Fig. 3. Two passages with the same words but the 2"! passage contains some letters with diacritics (highlighted in green) and a substitution of some interchangeable letters (highlighted in yellow). A simple plagiarism detector may fail to match them.  Regarding this aspect, Magooda et al. reported the use of two- language dependent processing in the source retrieval phase: stemming queries before submitting them to the search engine and extracting named entities. In the text alignment phase, words are stemmed in the skip-gram approach. Moreover, their methods pre- process the text by removing diacritics and normalizing letters)”. Alzahrani method is nearly language independent. The only reported language-specific process was stop words removal. It was applied as a pre-processing step on suspicious and source  documents.

Figure 3 Two passages with the same words but the 2"! passage contains some letters with diacritics (highlighted in green) and a substitution of some interchangeable letters (highlighted in yellow). A simple plagiarism detector may fail to match them. Regarding this aspect, Magooda et al. reported the use of two- language dependent processing in the source retrieval phase: stemming queries before submitting them to the search engine and extracting named entities. In the text alignment phase, words are stemmed in the skip-gram approach. Moreover, their methods pre- process the text by removing diacritics and normalizing letters)”. Alzahrani method is nearly language independent. The only reported language-specific process was stop words removal. It was applied as a pre-processing step on suspicious and source documents.