Academia.eduAcademia.edu

Outline

Weighted k-Nearest Neighbour for Image Spam Classification

2021, Iraqi journal of science

https://doi.org/10.24996/IJS.2021.62.3.32

Abstract

E-mail is an efficient and reliable data exchange service. Spams are undesired email messages which are randomly sent in bulk usually for commercial aims. Obfuscated image spamming is one of the new tricks to bypass text-based and Optical Character Recognition (OCR)-based spam filters. Image spam detection based on image visual features has the advantage of efficiency in terms of reducing the computational cost and improving the performance. In this paper, an image spam detection schema is presented. Suitable image processing techniques were used to capture the image features that can differentiate spam images from non-spam ones. Weighted k-nearest neighbor, which is a simple, yet powerful, machine learning algorithm, was used as a classifier. The results confirm the effectiveness of the proposed schema as it is evaluated over two datasets. The first dataset is a real and benchmark dataset while the other is a real-like, modern, and more challenging dataset collected from social media and many public available image spam datasets. The obtained accuracy was 99.36% and 91% on benchmark and the proposed dataset, respectively.

References (22)

  1. James F. and Kurose, K.W.R. 2016 Computer networking : a top-down approach.7 th edition, publisher :Pearson Education.
  2. Attar, A., R.M. Rad, and R.E. Atani. 2013. A survey of image spamming and filtering techniques. Artificial Intelligence Review,. 40(1): 71-105. link : https://doi.org/10.1007/s10462-011-9280-4.
  3. Dada, E.G., et al. . 2019. Machine learning for email spam filtering: review, approaches and open research problems. Heliyon,. 5(6). link : https://doi.org/10.1016/j.heliyon.2019.e01802
  4. Dhavale, S.V. . 2017. Advanced image-based spam detection and filtering techniques. Information Science Reference. DOI: 10.4018/978-1-68318-013-5
  5. Kumaresan, T., S. Sanjushree, and C. Palanisamy. 2014. Image spam detection using color features and K-Nearest neighbor classification. Int. J. Comput. Inf. Syst. Control Eng. 8(10): 1746- 1749.
  6. Annadatha, A. and M. Stamp . 2016. Image spam analysis and detection. Journal of Computer Virology and Hacking Techniques. 14(1): 39-52.link: https://doi.org/10.1007/s11416-016-0287-x
  7. Chavda, A. . 2017. Image Spam Detection, Master thesis in computer science , San Jose State University. link : https://doi.org/10.31979/etd.myqt-f92r
  8. Dinesh Kumar, A. and S. KP, 2018, DeepImageSpam: Deep Learning based Image Spam Detection. arXiv preprint arXiv:1810.03977,.
  9. Singh, A.P. . 2018. Image Spam Classification using Deep Learning, Master Thesis in computer scinece. San Jose State University.link : https://doi.org/10.31979/etd.wehw-dq4h
  10. Yang, H., et al. . 2019. A spam filtering method based on multi-modal fusion. Applied Sciences. 9(6): 1152. DOI: 10.3390/app9061152
  11. Sharmin, T., et al. .2020.Convolutional neural networks for image spam detection. Information Security Journal: A Global Perspective p. 1-15. link : https://doi.org/ 10.1080/19393555 .2020. 17 22867
  12. Zhang, D. 2019. Color Feature Extraction, in Fundamentals of Image Data Mining. Springer. p. pp 49-80.
  13. Hung, C.-C., E. Song, and Y. Lan . 2019. Image texture analysis. Springer.
  14. Matti Pietikäinen , A.H., Guoying Zhao , Timo Ahonen . 2011. Computer Vision Using Local Binary Patterns. Vol. 40.: springer.
  15. Alya'a, R.A. and B.N. Dhannoon . 2019. Real Time Multi Face Blurring on Uncontrolled Environment based on Color Space algorithm. Iraqi Journal of Science. : 618-1626.
  16. Gonzalez, R.C. . 2018. Digital Image Processing. 4 th edition.: Pearson.
  17. Kumar, J., S. Taterh, and D. Kamnthania . 2018. Study and Comparative Analysis of Various Image Spamming Techniques, in Soft Computing: Theories and Applications. Springer. p. 351- 365.
  18. Naoum, R.S. and Z.N. Al-Sultani . 2012. Learning vector quantization (LVQ) and k-nearest neighbor for intrusion classification. World of Computer Science and Information Technology Journal (WCSIT). 2(3): 105-109.
  19. Dudani, S.A. . 1976. The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics .(4): 325-327.
  20. Goodfellow, I., Y. Bengio, and A. Courville, 2016. Deep learning. MIT press.
  21. Gao, Y., 2008 . Image spam hunter, in IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE: Las Vegas, NV, USA.
  22. proposed image spam dataset. [accessed: June 1, 2020]; Available from: https://www.dropbox.com/s/rgzqty186afwna8/the%20proposed%20spam%20image%20dataset.rar ?dl=0.