A Framework for Plagiarism Detection in Arabic Documents
2015, Computer Science & Information Technology ( CS & IT )
https://doi.org/10.5121/CSIT.2015.50201Abstract
We are developing a web-based plagiarism detection system to detect plagiarism in written Arabic documents. This paper describes the proposed framework of our plagiarism detection system. The proposed plagiarism detection framework comprises of two main components, one global and the other local. The global component is heuristics-based, in which a potentially plagiarized given document is used to construct a set of representative queries by using different best performing heuristics. These queries are then submitted to Google via Google's search API to retrieve candidate source documents from the Web. The local component carries out detailed similarity computations by combining different similarity computation techniques to check which parts of the given document are plagiarised and from which source documents retrieved from the Web. Since this is an ongoing research project, the quality of overall system is not evaluated yet.
References (23)
- Alzahrani, S.M., Salim, N.& Abraham, A.(2012). Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE transactions on systems, man, and cybernetics-part c: applications and reviews, 42(2), pp. 133-149.
- Eissen, M., Stein, B. & Kulig, M.(2007). Plagiarism detection without reference collections. In Proceedings of the advances in data analysis, pp. 359-366.
- Benno, S., Moshe, K. & Efstathios, S.(2007). Plagiarism analysis, authorship identification, and near- duplicate detection. In Proceedings of the ACM SIGIR Forum PAN07, pp 68-71, New York.
- Clough, P. (2003). Old and new challenges in automatic plagiarism detection. National Plagiarism Advisory Service, (February edition).
- Brin, S., Davis, J., & Garcia-Molina, H.(1995). Copy detection mechanisms for digital documents. In proceedings of the ACM SIGMOD annual conference.
- Shivakumar, N., & Garcia-Molina, H.(1996). Building a scalable and accurate copy detection mechanism. Proceedings of the first ACM international conference on digital libraries.
- Si, Leong, H.V., & Lau, R.W.(97). CHECK: A document plagiarism detection system. In Proceedings of ACM symposium for applied computing, pp. 70-77.
- Broder, A.Z. (1997). On the resemblance and containment of documents. In compression and complexity of sequences , pp. 21-29.
- Monostori, K., Zaslavsky, A., & Schmidt, H. (2000). MatchDetectReveal: Finding overlapping and similar digital documents. In proceedings of information resources management association international conference, pp. 955-957.
- Khmelev, D., & Teahan, W. (2003). A repetition based measure for verification of text collections and for text categorization. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, pp. 104-110.
- Runeson, P., Alexandersson, M., & Nyholm, O. (2007). Detection of duplicate defect reports using natural language processing. In proceedings of 29th international conference on software engineering, pp. 499-510.
- Leung, C.-H., & Chan, Y.-Y. (2007). A natural language processing approach to automatic plagiarism detection. In proceedings of the 8th ACM SIGITE conference on information technology education, (pp. 213-218).
- Androutsopoulos, I., & Malakasiotis, P.(2009). A Survey of paraphrasing and textual entailment methods. Technical report, Athens University of Economics and Business, Greece.
- Ceska, Z., & Fox, C.(2009). The influence of text pre-processing on plagiarism detection. In recent advances in natural language processing, RANLP'09 .
- Chong, M., Specia, L., & Mitkov, R. (2010). Using natural language processing for automatic detection of plagiarism. In proceedings of 4th international plagirism conference.
- Alzahrani, S.M. & Salim, N. (2009) Fuzzy semantic-based string similarity for extrinsic plagiarism detection. In Proceedings of the 2nd international conference on the applications of digital information and Web technologies., London, UK.
- Bensalem, I.Rosso, P. & Chikhi, S. (2012). Intrinsic plagiarism detection in Arabic text: preliminary experiments. In Proceedings of the 2nd Spanish conference on information retrieval, Spain.
- Menai, M.(2012) Detection of plagiarism in Arabic documents. International journal of information technology and computer science (IJITCS), 4(10).
- Khan, I.H.,Siddiqui, M. Jambi, K. M., Imran, M & Bagais, A. A. (2014). Query optimization in Arabic plagiarism detection: an empirical study. To appear in International Journal of Intelligent Systems and Applications.
- Khoja, S.(1999). Stemming Arabic Text. Online available: http://zeus.cs.pacificu.edu/shereen/research.htm.
- Siddiqui, M.A., Elhag, S.,Khan, I.H., & Jambi, K. M. Building an Arabic plagiarism detection corpus. To appear in language resources and engineering.
- Haggag, O. & El-Beltagy, S. (2013). Plagiarism candidate retrieval using selective query formulation and discriminative query scoring. In proceedings of PAN, CLEF.
- Ferret (2009). Online available at University of Hertfordshire: http://homepages.feis.herts.ac.uk/~pdgroup/.