Preprocessing and mining web log data for web personalization
2003, AI* IA 2003: Advances in …
https://doi.org/10.1007/978-3-540-39853-0_20Abstract
We describe the web usage mining activities of an on-going project, called ClickWorld 1 , that aims at extracting models of the navigational behaviour of a web site users. The models are inferred from the access logs of a web server by means of data and web mining techniques. The extracted knowledge is deployed to the purpose of offering a personalized and proactive view of the web services to users. We first describe the preprocessing steps on access logs necessary to clean, select and prepare data for knowledge extraction. Then we show two sets of experiments: the first one tries to predict the sex of a user based on the visited web pages, and the second one tries to predict whether a user might be interested in visiting a section of the site.
References (16)
- D. W. Aha, D. Kibler, and M. K. Albert. Instance-based learning algorithms. Machine Learning, 6(1):37-66, 1991. 244
- B. Berendt, B. Mobasher, M. Spiliopoulou, and M. Nakagawa. A framework for the evaluation of session reconstruction heuristics in web usage analysis. INFORMS Journal of Computing, 15(2), 2003. 238, 242
- B. Berendt and M. Spiliopolou. Analysis of navigation behaviour in web sites integrating multiple information systems. VLDB Journal, 9(1):56-75, 2000. 238
- R. Cooley, M. Deshpande, J. Srivastava, and P. N. Tan. Web usage mining: Dis- covery and applications of usage patterns from web data. ACM SIGKDD Explo- rations, 1(2), January 2000. 238
- H. Dai and B. Mobasher. A road map to more effective web personalization: Integrating domain knowledge with web usage mining. In Proceedings of the International Conference on Internet Computing 2003 (IC03), 2003. 240
- M. Eirinaki and M. Vazirgiannis. Web mining for web personalization. ACM Transactions on Internet Technology (TOIT), 3(1):1-27, 2003. 238
- J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kauf- mann, San Mateo, CA, 2000. 238, 244
- K. P. Joshi, A. Joshi, Y. Yesha, and R. Krishnapuram. Warehousing and mining web logs. In In Proc. of ACM CIKM Workshop on Web Information and Data Management (WIDM'99), pages 63-68. ACM, 1999. 238
- KDnuggets. Software for web mining. http://www.kdnuggets.com/- software/web.html. 238
- R. Kosala and H. Blockeel. Web mining research: A survey. SIGKDD Esplorations, 2(1):1-15, 2000. 237
- W. Li, J. Han, and J. Pei. CMAR: Accurate and efficient classification based on multiple class-association rules. In IEEE International Conference on Data Mining, pages 369-376, 2001. 244
- D. Murray and K. Durrell. Inferring demographic attributes of anonymous inter- net users. In Web Usage Analysis and User Profiling Workshop, volume 1836 of Lecture Notes in Computer Science, pages 7-20. Springer, 2000. 246
- C. Pohle and M. Spiliopoulou. Building and exploiting ad hoc concept hierarchies for web log analysis. In Proceedings of DaWaK 2002, volume 2454 of LNCS, pages 83-93, 2002. 240
- S. Ruggieri. Efficient C4.5. IEEE Transactions on Knowledge and Data Engi- neering, 14:438-444, 2002. 244
- M. Spiliopoulou and L. C. Faulstich. WUM: a Web Utilization Miner. In Proceed- ings of the EDBT Workshop WebDB98, volume 1590 of LNCS, pages 109-115, 1998. 238
- M. Sweiger, M. R. Madsen, J. Langston, and H. Lombard. Clickstream Data Warehousing. John Wiley & Sons, 2002. 238