Detection of Web site visitors based on fuzzy rough sets
2017, Soft Computing
https://doi.org/10.1007/S00500-016-2476-4Abstract
Despite emerging of Web 2.0 applications and increasing requirements to well-behaved Web robots, malicious ones can reveal irreparable risks for Web sites. Regardless of behavior of Web robots, they may occupy bandwidth and reduce performance of Web servers. In spite of many prestigious researches trying to characterize Web visitors and classify them, there is a lack of concentration on feature selection to dynamically choose attributes used to describe Web sessions. On the other hand, depending on an accurate clustering technique, which can deal with huge number of samples in a reasonable amount of time, is practically important. Therefore, in this paper, a new algorithm, fuzzy rough set-Web robot detection (FRS-WRD), is proposed based on fuzzy rough set theory to better characterize and cluster Web visitors of three real Web sites. External evaluations show that in contrast to state-of-the-art algorithms, FRS-WRD achieves better results in terms of G-mean 95%, Jaccard 88%, entropy 0.36, and finally, purity 96%. Moreover, according to confusion matrixes, it can better detect malicious Web visitors.
References (43)
- Amigó E, Gonzalo J, Verdejo F (2013) A general evaluation measure for document organization tasks. In: Proceedings of the 36th inter- national ACM SIGIR conference on Research and development in information retrieval, ACM, pp 643-652
- Ansari ZA, Sattar SA, Babu AV (2015) A fuzzy neural network based framework to discover user access patterns from web log data. Adv Data Anal Classif. doi:10.1007/s11634-015-0228-4
- Antoine V, Quost B, Masson M-H, Denoeux T (2014) CEVCLUS: evidential clustering with instance-level constraints for relational data. Soft Comput 18(7):1321-1335
- Bomhardt C, Gaul W, Schmidt-Thieme L (2005) Web robot detection- preprocessing web logfiles for robot detection. In: Bock HH et al (eds) New developments in classification and data analysis. Springer, Berlin, pp 113-124
- Chen D, Yang W, Li F (2008) Measures of general fuzzy rough sets on a probabilistic space. Inf Sci 178(16):3177-3187
- Dubois D, Prade H (1990) Rough fuzzy sets and fuzzy rough sets*. Int J Gen Syst 17(2-3):191-209
- Gržinić T, Mršić L, Šaban J (2015) Lino-an intelligent system for detect- ing malicious web-robots. In: Asian Conference on Intelligent Information and Database Systems, Springer International Pub- lishing, pp 559-568
- Hamidzadeh J (2015) IRDDS: instance reduction based on distance- based decision surface. J AI Data Min 3(2):121-130
- Hamidzadeh J, Monsefi R, Yazdi HS (2014) LMIRA: large margin instance reduction algorithm. Neurocomputing 145:477-487
- Hamidzadeh J, Monsefi R, Yazdi HS (2015) IRAHC: instance reduc- tion algorithm using hyperrectangle clustering. Pattern Recogn 48(5):1878-1889
- Inuiguchi M, Wu W-Z, Cornelis C, Verbiest N (2015) Fuzzy-rough hybridization. Springer Handbook of Computational Intelligence. Springer, Berlin
- Kohonen T (2013) Essentials of the self-organizing map. Neural Netw 37:52-65
- Kwon S, Oh M, Kim D, Lee J, Kim Y-G, Cha S (2012) Web robot detection based on monotonous behavior. In: Proceedings of the Information Science and Industrial Applications, vol 4. Springer- Verlag, pp 43-48
- Lee J, Cha S, Lee D, Lee H (2009) Classification of web robots: an empirical study based on over one billion requests. Comput Secur 28(8):795-802
- Liu Z, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn 52:85- 95
- Liu Z, Pan Q, Dezert J, Mercier G (2015) Credal c-means clustering method based on belief functions. Knowl Based Syst 74:119-132
- Lourenço AG, Belo OO (2006) Catching web crawlers in the act. In: Proceedings of the 6th international Conference on Web Engineer- ing, vol 263, ACM, pp 265-272
- Lu W-Z, Yu S (2006) Web robot detection based on hidden Markov model. In: 2006 International Conference on Communications, Circuits and Systems
- Moghaddam VH, Hamidzadeh J (2016) New Hermite orthogonal poly- nomial kernel and combined kernels in support vector machine classifier. Pattern Recogn 60:921-935
- Nowicki RK, Nowak BA, Woźniak M (2016) Application of rough sets in k nearest neighbours algorithm for classification of incomplete samples. In: Knowledge, Information and Creativity Support Sys- tems. Springer International Publishing, pp 243-257
- Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341-356
- Qian Y, Wang Q, Cheng H, Liang J, Dang C (2015) Fuzzy-rough feature selection accelerator. Fuzzy Sets Syst 258:61-78
- Radzikowska AM, Kerre EE (2002) A comparative study of fuzzy rough sets. Fuzzy Sets Syst 126(2):137-155
- Sadeghi R, Hamidzadeh J (2016) Automatic support vector data descrip- tion. Soft Comput. doi:10.1007/s00500-016-2317-5
- Shafer G (1976) A mathematical theory of evidence, vol 1. Princeton University Press, Princeton
- Sisodia DS, Verma S, Vyas OP (2015) Agglomerative approach for identification and elimination of web robots from web server logs to extract knowledge about actual visitors. J Data Anal Inform Process 3(2):1-10
- Staeding A (2015) Bots versus browsers-public bots and user agents database and commentary. Retrieved from http://www. botsvsbrowsers.com/
- Stassopoulou A, Dikaiakos MD (2009) Web robot detection: a proba- bilistic reasoning approach. Comput Netw 53(3):265-278
- Stevanovic D, An A, Vlajic N (2012) Feature evaluation for web crawler detection with data mining techniques. Expert Syst Appl 39(10):8707-8717
- Stevanovic D, Vlajic N, An A (2013) Detection of malicious and non-malicious website visitors using unsupervised neural network learning. Appl Soft Comput 13(1):698-708
- Suchacka G, Sobkow M (2015) Detection of internet robots using a Bayesian approach. In: Cybernetics (CYBCONF), 2015 IEEE 2nd International Conference on, IEEE, pp 365-370
- Tan P-N, Kumar V (2002) Discovery of web robot sessions based on their navigational patterns. Data Min Knowl Disc 6(1):9-35
- Verbiest N, Cornelis C, Herrera F (2013a) FRPS: a fuzzy rough proto- type selection method. Pattern Recogn 46(10):2770-2782
- Verbiest N, Cornelis C, Herrera F (2013b) OWA-FRPS: a prototype selection method based on ordered weighted average fuzzy rough set theory. In: International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, vol 8170. Springer, Berlin, pp 180-190
- Vlajic N, Card HC (2001) Vector quantization of images using modi- fied adaptive resonance algorithm for hierarchical clustering. IEEE Trans Neural Netw 12(5):1147-1162
- Wang Xi-Zhao, Zhai Jun-Hai, Shu-Xia Lu (2008) Induction of mul- tiple fuzzy decision trees based on rough set technique. Inf Sci 178(16):3188-3202
- Wu W-Z, Leung Y, Zhang W-X (2002) Connections between rough set theory and Dempster-Shafer theory of evidence. Int J Gen Syst 31(4):405-430
- Yao YY, Lingras PJ (1998) Interpretations of belief functions in the theory of rough sets. Inf Sci 104(1):81-106
- Zabihi M, Jahan MV, Hamidzadeh J (2014a) A density based clustering approach for web robot detection. In: Computer and Knowledge Engineering (ICCKE), 2014 4th International eConference on, IEEE, pp 23-28
- Zabihi M, Jahan MV, Hamidzadeh J (2014b) A density based clustering approach to distinguish between web robot and human requests to a web server. ISC Int J Inf Secur 6(1):77-89
- Zadeh LA (1974) The concept of a linguistic variable and its application to approximate reasoning. Springer, Berlin
- Zhai J (2011) Fuzzy decision tree based on fuzzy-rough technique. Soft Comput 15(6):1087-1096
- Zhao D, Traore I, Sayed B, Lu W, Saad S, Ghorbani A, Garant D (2013) Botnet detection based on traffic behavior analysis and flow inter- vals. Comput Secur 39:2-16