Academia.eduAcademia.edu

Outline

Detection of Web site visitors based on fuzzy rough sets

2017, Soft Computing

https://doi.org/10.1007/S00500-016-2476-4

Abstract

Despite emerging of Web 2.0 applications and increasing requirements to well-behaved Web robots, malicious ones can reveal irreparable risks for Web sites. Regardless of behavior of Web robots, they may occupy bandwidth and reduce performance of Web servers. In spite of many prestigious researches trying to characterize Web visitors and classify them, there is a lack of concentration on feature selection to dynamically choose attributes used to describe Web sessions. On the other hand, depending on an accurate clustering technique, which can deal with huge number of samples in a reasonable amount of time, is practically important. Therefore, in this paper, a new algorithm, fuzzy rough set-Web robot detection (FRS-WRD), is proposed based on fuzzy rough set theory to better characterize and cluster Web visitors of three real Web sites. External evaluations show that in contrast to state-of-the-art algorithms, FRS-WRD achieves better results in terms of G-mean 95%, Jaccard 88%, entropy 0.36, and finally, purity 96%. Moreover, according to confusion matrixes, it can better detect malicious Web visitors.

References (43)

  1. Amigó E, Gonzalo J, Verdejo F (2013) A general evaluation measure for document organization tasks. In: Proceedings of the 36th inter- national ACM SIGIR conference on Research and development in information retrieval, ACM, pp 643-652
  2. Ansari ZA, Sattar SA, Babu AV (2015) A fuzzy neural network based framework to discover user access patterns from web log data. Adv Data Anal Classif. doi:10.1007/s11634-015-0228-4
  3. Antoine V, Quost B, Masson M-H, Denoeux T (2014) CEVCLUS: evidential clustering with instance-level constraints for relational data. Soft Comput 18(7):1321-1335
  4. Bomhardt C, Gaul W, Schmidt-Thieme L (2005) Web robot detection- preprocessing web logfiles for robot detection. In: Bock HH et al (eds) New developments in classification and data analysis. Springer, Berlin, pp 113-124
  5. Chen D, Yang W, Li F (2008) Measures of general fuzzy rough sets on a probabilistic space. Inf Sci 178(16):3177-3187
  6. Dubois D, Prade H (1990) Rough fuzzy sets and fuzzy rough sets*. Int J Gen Syst 17(2-3):191-209
  7. Gržinić T, Mršić L, Šaban J (2015) Lino-an intelligent system for detect- ing malicious web-robots. In: Asian Conference on Intelligent Information and Database Systems, Springer International Pub- lishing, pp 559-568
  8. Hamidzadeh J (2015) IRDDS: instance reduction based on distance- based decision surface. J AI Data Min 3(2):121-130
  9. Hamidzadeh J, Monsefi R, Yazdi HS (2014) LMIRA: large margin instance reduction algorithm. Neurocomputing 145:477-487
  10. Hamidzadeh J, Monsefi R, Yazdi HS (2015) IRAHC: instance reduc- tion algorithm using hyperrectangle clustering. Pattern Recogn 48(5):1878-1889
  11. Inuiguchi M, Wu W-Z, Cornelis C, Verbiest N (2015) Fuzzy-rough hybridization. Springer Handbook of Computational Intelligence. Springer, Berlin
  12. Kohonen T (2013) Essentials of the self-organizing map. Neural Netw 37:52-65
  13. Kwon S, Oh M, Kim D, Lee J, Kim Y-G, Cha S (2012) Web robot detection based on monotonous behavior. In: Proceedings of the Information Science and Industrial Applications, vol 4. Springer- Verlag, pp 43-48
  14. Lee J, Cha S, Lee D, Lee H (2009) Classification of web robots: an empirical study based on over one billion requests. Comput Secur 28(8):795-802
  15. Liu Z, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn 52:85- 95
  16. Liu Z, Pan Q, Dezert J, Mercier G (2015) Credal c-means clustering method based on belief functions. Knowl Based Syst 74:119-132
  17. Lourenço AG, Belo OO (2006) Catching web crawlers in the act. In: Proceedings of the 6th international Conference on Web Engineer- ing, vol 263, ACM, pp 265-272
  18. Lu W-Z, Yu S (2006) Web robot detection based on hidden Markov model. In: 2006 International Conference on Communications, Circuits and Systems
  19. Moghaddam VH, Hamidzadeh J (2016) New Hermite orthogonal poly- nomial kernel and combined kernels in support vector machine classifier. Pattern Recogn 60:921-935
  20. Nowicki RK, Nowak BA, Woźniak M (2016) Application of rough sets in k nearest neighbours algorithm for classification of incomplete samples. In: Knowledge, Information and Creativity Support Sys- tems. Springer International Publishing, pp 243-257
  21. Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341-356
  22. Qian Y, Wang Q, Cheng H, Liang J, Dang C (2015) Fuzzy-rough feature selection accelerator. Fuzzy Sets Syst 258:61-78
  23. Radzikowska AM, Kerre EE (2002) A comparative study of fuzzy rough sets. Fuzzy Sets Syst 126(2):137-155
  24. Sadeghi R, Hamidzadeh J (2016) Automatic support vector data descrip- tion. Soft Comput. doi:10.1007/s00500-016-2317-5
  25. Shafer G (1976) A mathematical theory of evidence, vol 1. Princeton University Press, Princeton
  26. Sisodia DS, Verma S, Vyas OP (2015) Agglomerative approach for identification and elimination of web robots from web server logs to extract knowledge about actual visitors. J Data Anal Inform Process 3(2):1-10
  27. Staeding A (2015) Bots versus browsers-public bots and user agents database and commentary. Retrieved from http://www. botsvsbrowsers.com/
  28. Stassopoulou A, Dikaiakos MD (2009) Web robot detection: a proba- bilistic reasoning approach. Comput Netw 53(3):265-278
  29. Stevanovic D, An A, Vlajic N (2012) Feature evaluation for web crawler detection with data mining techniques. Expert Syst Appl 39(10):8707-8717
  30. Stevanovic D, Vlajic N, An A (2013) Detection of malicious and non-malicious website visitors using unsupervised neural network learning. Appl Soft Comput 13(1):698-708
  31. Suchacka G, Sobkow M (2015) Detection of internet robots using a Bayesian approach. In: Cybernetics (CYBCONF), 2015 IEEE 2nd International Conference on, IEEE, pp 365-370
  32. Tan P-N, Kumar V (2002) Discovery of web robot sessions based on their navigational patterns. Data Min Knowl Disc 6(1):9-35
  33. Verbiest N, Cornelis C, Herrera F (2013a) FRPS: a fuzzy rough proto- type selection method. Pattern Recogn 46(10):2770-2782
  34. Verbiest N, Cornelis C, Herrera F (2013b) OWA-FRPS: a prototype selection method based on ordered weighted average fuzzy rough set theory. In: International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, vol 8170. Springer, Berlin, pp 180-190
  35. Vlajic N, Card HC (2001) Vector quantization of images using modi- fied adaptive resonance algorithm for hierarchical clustering. IEEE Trans Neural Netw 12(5):1147-1162
  36. Wang Xi-Zhao, Zhai Jun-Hai, Shu-Xia Lu (2008) Induction of mul- tiple fuzzy decision trees based on rough set technique. Inf Sci 178(16):3188-3202
  37. Wu W-Z, Leung Y, Zhang W-X (2002) Connections between rough set theory and Dempster-Shafer theory of evidence. Int J Gen Syst 31(4):405-430
  38. Yao YY, Lingras PJ (1998) Interpretations of belief functions in the theory of rough sets. Inf Sci 104(1):81-106
  39. Zabihi M, Jahan MV, Hamidzadeh J (2014a) A density based clustering approach for web robot detection. In: Computer and Knowledge Engineering (ICCKE), 2014 4th International eConference on, IEEE, pp 23-28
  40. Zabihi M, Jahan MV, Hamidzadeh J (2014b) A density based clustering approach to distinguish between web robot and human requests to a web server. ISC Int J Inf Secur 6(1):77-89
  41. Zadeh LA (1974) The concept of a linguistic variable and its application to approximate reasoning. Springer, Berlin
  42. Zhai J (2011) Fuzzy decision tree based on fuzzy-rough technique. Soft Comput 15(6):1087-1096
  43. Zhao D, Traore I, Sayed B, Lu W, Saad S, Ghorbani A, Garant D (2013) Botnet detection based on traffic behavior analysis and flow inter- vals. Comput Secur 39:2-16