Rough Sets Clustering and Markov model for Web Access Prediction
2006
Abstract
Discovering user access patterns from web access log is increasing the importance of information to build up adaptive web server according to the individual user's behavior. The variety of user behaviors on accessing information also grows, which has a great impact on the network utilization. In this paper, we present a rough set clustering to cluster web transactions from web access logs and using Markov model for next access prediction. Using this approach, users can effectively mine web log records to discover and predict access patterns. We perform experiments using real web trace logs collected from www.dusit.ac.th servers. In order to improve its prediction ration, the model includes a rough sets scheme in which search similarity measure to compute the similarity between two sequences using upper approximation.
References (22)
- J. Zhu, Mining Web Site Link Structure for Adaptive Web Site Navigation and Search, PhD thesis, Faculty of Informatics, University of Ulster at Jordanstown, 2003.
- P. Baldi, P. Frasconi, and P. Smyth, Modeling the Internet and the Web Wiley, 2003.
- S. Chakrabarti, Mining the Web, Morgan Kaufmann, 2003.
- A. Vakali, J. Pokorny, and T. Dalamagas, An Overview of Web Data Clustering Practices, Proceeding of the EDBT Workshop on Cluster Web, Lecture Notes in Computer Science (LNCS) Series, Springer Verlag, Heraklion, Greece, March 2004, pp. 597-606.
- Z. Chen, A.Wai-Chee Fu, and F. Chi-Hung Tong, Optimal algorithms for finding user access sessions from very large Web logs. World Wide Web: Internet and Information Systems, 2003, pp. 259-279.
- R. Cooley, B. Mobasher, and J. Srivastava, Data preparation for mining World Wide Web browsing patterns Knowledge Information Systems, 1999, pp. 5-32.
- Z. Pawlak, Rough Sets, International Journal of Information and Computer Science, Vol. 11, 1982, pp. 145-172.
- Z. Pawlak, Rough Sets-Theoretical Aspects of reasoning about Data, Kluwer Academic Publisher, Dordrecht, 1991.
- J. Stefanowski and K. Slowinski, Rough set as a tool for studying attribute dependencies in the urinary stones treatment data, In Rough Sets and Data Mining Analysis for Imprecise Data, London: Kluwer, 1997, pp. 177-195.
- K. Cios, Witold Pedrycz and Roman Swiniarski, Data Mining Method for Knowledge Discovery, London: Kluwer, 2000, pp. 27-66.
- A. Joshi, R. Krishnapuram, Robust fuzzy clustering methods to support web mining, Proceeding Workshop in Data Mining and Knowledge Discovery, SIGMOD, 1998, pp. 151-158.
- S. Kumar de and P. R. Krishna, Clustering Web Transactions using Rough Approximation, Journal of Fuzzy sets and systems, Vol. 148, 2004, pp. 131- 138.
- T. Palpanas and A. Mendelzon, Web prefetching using partial match prediction , Proceedings of Web Caching Workshop, San Diego, California, March 1999.
- A. Nanopoulos, D. Katsaros, and U. Manolopoulos, Effective prediction of web-user accesses: A data mining approach, Proceeding of the Workshop WEBKDD, 2001.
- B. Mobasher, W. Dai, T. Luo, and M. Nakagawa, Using Sequential and Non-Sequential Patterns for Predictive Web Usage Mining Tasks, Proceedings of the IEEE International Conference on Data Mining (ICDM'2002), Maebashi City, Japan, December, 2002.
- J.Han, J. Pei, B. Mortazavi-Asi, Q. Chen, U. Dayal, and M.C. Hsu, Freespan: Frequent Pattern- Projected Sequential Pattern Mining, Proceedings of the Association for Computing Machinery Sixth International Conference on Knowledge Discovery and Data Mining, 2000, pp. 355-359.
- J. Pei, J. Han, B. Mortazavi-Asl, W. Pinto, Q. Chen, U. Dayal, and M. Hsu. Prefixspan: Mining sequential patterns by prefix- projected growth In ICDE, 2001, pp. 215-224.
- Y.Wang, A Hybrid Markov Prediction Model for Web Prefetching, Master thesis, Department of Electrical and Computer Engineering, Calgary, Alberta, 2003.
- R.Cooley, P-N.Tan, J.Srivastava, Discovery of Interesting Usage Patterns from Web Data, In Springer-Verlag LNCS/LNAI series, 2000.
- Y. Fu, K. Sandhu, and M.Y. Shih, Clustering of Web Users Based on Access Patterns, Proc. of the 5th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, San Diego: Springer, 1999.
- G. Pallis, L. Angelis, and A. Vakali, Model- based cluster analysis for web users' sessions, Springer-Verlag Berlin Heideberg, 2005, pp. 219-227.
- L.Catledge, and J.E. Pitkow, Characterizing Browsing Behaviors on The World Wide Web, Computer networks and ISDN Systems, Vol.27, No.6, 1995.