Academia.eduAcademia.edu

Figure V-13. Sensitivity of Stamatatos’ method performance to the size of the selected subset of the n-grams (in percentage) and n-gram length. N-grams are selected from profiles sorted according to frequencies starting from the least frequent n-grams (En-1 and Ar-1) or the most frequent n-grams (En-2 and Ar-2). The performance is computed on English (En-1 and En-2) and Arabic (Ar-1 and Ar-2) documents. In the charts En- 1 and Ar-1, the values of the x-axis labelled with an asterisk (*) represent the sizes of sub-profiles that contain only n-grams whose frequency = 1 whatever their proportion in the document’s full profile.  Chapter V. Character N-grams as the Only Intrinsic Evidence of Plagiarism

Figure 26 V-13. Sensitivity of Stamatatos’ method performance to the size of the selected subset of the n-grams (in percentage) and n-gram length. N-grams are selected from profiles sorted according to frequencies starting from the least frequent n-grams (En-1 and Ar-1) or the most frequent n-grams (En-2 and Ar-2). The performance is computed on English (En-1 and En-2) and Arabic (Ar-1 and Ar-2) documents. In the charts En- 1 and Ar-1, the values of the x-axis labelled with an asterisk (*) represent the sizes of sub-profiles that contain only n-grams whose frequency = 1 whatever their proportion in the document’s full profile. Chapter V. Character N-grams as the Only Intrinsic Evidence of Plagiarism