Pre-Processing: Procedure on Web Log File for Web Usage Mining
2014
Sign up for access to the world's latest research
Abstract
— These days World Wide Web becomes very popular and interactive for transferring of Information. Web usage mining is the area of data mining which deals with the discovery and analysis of usage patterns from Web data, specifically web logs, in order to improve web based applications. Web usage mining consists of three phases, preprocessing, pattern discovery, and pattern analysis. After the completion of these three phases the user can find the required usage patterns and use these information for the specific needs. The web access log file is saved to keep a record of every request made by the users. However, the data stored in the log files does not specify accurate details of the users’ accesses to the Web site. So, preprocessing of the Web log data is first and important phase before web log file can be applied for pattern analysis & pattern discovery. The preprocessed Web Log file can then be suitable for the discovery and analysis of useful information referred to as Web mini...
Related papers
Nowadays, the growth of World Wide Web has exceeded a lot with more expectations. The internet is growing day by day, so online users are also rising. The interesting information for knowledge of extracting from such huge data demands for new logic and the new method. Every user spends their most of the time on the internet and their behavior is different from one and another. Web usage mining is the category of web mining that helps in automatically discovering user access pattern. Web usage mining is leading research area in Web Mining concerned about the web user's behavior. In this paper emphasizes is given on the user Behaviors using web server log file prediction using web server log record, click streams record and user information. Users using web pages, frequently visited hyperlinks, frequently accessed web pages, links are stored in web server log files. A Web log along with the individuality of the user captures their browsing behavior on a website and discussing regarding the behavior from analysis of different algorithms and different methods
2013
Abstract-As there is a huge amount of data available online, the World Wide Web is a fertile area for data mining research. In recent years a various surveys have been performed on static data of web sites to perform web usage mining. This paper deals with the Web usage mining of a website which is hosted on IIS web server. Web usage mining is the area of data mining which deals with the discovery and analysis of usage patterns from Web data, specifically web logs, in order to perform improvements in web based applications. Web usage mining consists of three phases, pre-processing, pattern discovery, and pattern analysis. After the completion of these three phases the user can find the required usage patterns and use this information for the specific needs. The research is being performed on a log file using Log Parser.
2007
In this paper we focus on log data preprocessing, the first step of a common Web Usage Mining process. In particular, we present LODAP (LOg DAta Preprocessor), a software tool which we designed and implemented in order to perform preprocessing of log data. The working scheme of LODAP embraces several steps. Firstly, log files are cleaned by removing irrelevant data. Then, the remaining requests are structured into user sessions, encoding the browsing behavior of users. Successively, the uninteresting sessions and the least visited pages are removed in order to reduce the size of data concerning the previously extracted user sessions. In addition, LODAP allows to create reports containing the results obtained in each step and information summaries mined from the analysis of the considered log files. During the preprocessing through LODAP, the analyst is guided by a sequence of panels representing the wizard-based interface which characterizes the tool. Each panel is a graphical window which offers a basic function of the preprocessor. Preliminary results on log files of a specific Web site show that the implemented tool can effectively reduce the log data size and identify user sessions encoding the user browsing behavior in a significant manner.
Web use mining is a key procedure to concentrate information from web. Web use mining gives use design and client route for the site. Utilizing WUM association advantageously extricates visit get to designs and likewise enhances the website composition. In this Paper, The Principle Procedure is 3 Phases, Information Cleaning, User Recognizable proof and Session Distinguishing proof. We are execution of the information cleaning procedure of web log information utilizing Web utilization mining Procedure. Web log preprocessing we are enhancing for exactness and proficiency.
The web usage mining is the technique to realize the hidden knowledge and information for the web user. A Web user is a person who accesses the internet and surf or visits web pages of website from any device. Web server log files are used as the data source in web usage mining. When a user visits the web pages of the website, then they leave colossal information behind in the web server log file. Examining the web server log file lead us to understand the user behavior on the internet. Web server log file includes two type of log information that is web server access and application server. So, web log file is an essential fragment of web mining which mines the hidden knowledge of user like internet usage pattern and visits characteristics. Our primary focus is to identify web pages visited by user most by using web mining technique. This classification will help us to determine the behavior of a user on the web. We have used pre-processing methods, then classification technique to extract the hidden knowledge and information of a user from browsing pattern.
2017
An enormous amount of data is available in form of web documents over the World Wide Web and is increasing day by day. Web mining is used to extract useful information from web documents. Web mining is categorized into three types, namely, web content mining, web structure mining and web usage mining. Web usage mining is the data mining technique to mine the web log data from World Wide Web to extract useful information. Web usage mining useful for the applications like ecommerce to do personalized marketing, fight against terrorism, fraud detection, to identify criminal activities, web design etc. This paper is going to explain in detail about the process involved in Web Usage Mining, Web Usage Mining applications and tools. Keywords-Web Usage Mining, Web log files, Web usage mining process, Web usage mining applications and Tools. TYPES OF WEB SERVER LOG FILES There are four types of web server log files[4]: a. Access log file: Access log is used to capture the information about the user and it has many numbers of attributes like Date,
2016
The Web has recently become a powerful stand for retrieval of Information and discovering knowledge from web data. The Web mining is one of the applications of data mining techniques to depict the knowledge out from web log data. Web mining is generally defined as Preprocessing, discovery and analysis of useful information from the Web. Web Usage Mining consists as the process of Preprocessing, Pattern Discovery and pattern Analysis. The memory and time usage is compared by means of the pattern discovery algorithms such as Apriori and Frequent Pattern Growth algorithm. The aim of this paper is to understand the web usage mining process such as preprocessing of web usage data and also the finding of frequent Patterns and their analysis. And also the comparison of both algorithms on the same dataset is done. Due to more use of internet, the log files are increasing at higher rate in according to size. The Preprocessing plays an important role in efficient mining process because data i...
Due to huge, unstructured and scattered amount of data available on web, it is very tough for users to get relevant information in less time. To achieve this, improvement in design of web site, personalization of contents, prefetching and caching activities are done according to user’s behavior analysis. User’s activities can be captured into a special file called log file. There are various types of log: Server log, Proxy server log, Client/Browser log. These log files are used by web usage mining to analyze and discover useful patterns. The process of web usage mining involves three interdependent steps: Data preprocessing, Pattern discovery and Pattern analysis. Among these steps, Data preprocessing plays a vital role because of unstructured, redundant and noisy nature of log data. To improve later phases of web usage mining like Pattern discovery and Pattern analysis several data preprocessing techniques such as Data Cleaning, User Identification, Session Identification, Path Completion etc. have been used. In this paper all these techniques are discussed in detail. Moreover these techniques are also categorized and incorporated with their advantage and disadvantage that will help scientist, researchers and academicians working in this direction.
International Journal of Engineering Research and, 2015
The Internet web has become popular tool to assist human for their information needs from web server. Due to increasing number of users for web access day by day, there is a need to analyze behavior of such user, in order to monitor and improve performance and throughput of website. Web usage mining is one of the data mining applications which deal with web log files and extract useful information from web. There are different phases are for web usage mining: Data preprocessing, discover pattern and pattern analysis. Among them data preprocessing is the most crucial phase of web usage mining because without good quality of data it is difficult to identify pattern of users behavior. This paper provides reviews of different data preprocessing methods like data collection, data cleaning, User identification, session identification and path completion which will be useful for the community to select one or combination of available techniques in order to carry out efficient preprocessing in order to obtain reliable data mining outcome.
HELIX
Internet is a huge source of massive information for retrieving information and searching knowledge from WWW, leading to increase network traffic, access delay & server overload, which results in poor web services. With the use of Web-caching & web prefetching techniques to enhance the performance of web services where web mining techniques play an important role to decide which web object should be pre-fetched from server and stored in proxy cache memory so that the web object with high probability of request, in the next couple of days, serves as the base of the proxy cache. But for efficient web mining and to extract meaningful usage access pattern, the raw log file must be transformed into a meaningful & formatted file. This paper proposed a new dynamic preprocess technique to create a dynamic training dataset for prediction model using web mining, and Graph based substructure Pattern Mining (GSPAN) for improved preprocessing using proxy log. The proposed model would help in minimizing the cache size by 40% thus improving the overall performance.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (4)
- Theint Theint Aye. 2011. Web Log Cleaning for Mining Of Web Usage Patterns. IEEE.
- K.R. Suneetha and Dr. R. Krihnamoorthi. 2009. Identifying User Behavior by Analyzing Web Server Access Log File. IJCSNS.
- R.Cooley, Bamshad Mobasherand Jaideep Srivastava, "DataPreparation for Mining World Wide Web Browsing Patterns." Knowledge and Information Systems,1(1),1999,5-32 R.Kosala and H. Blockeel, "Web Mining Research : A Survey." ACM SIGKDD Explorations, 2000, 1-15.
- R.Cooley, B. Mobasher and J. Srivatsava, "Web mining: Information and pattern discovery on the World Wide Web." 9th IEEE Inernational Conference on Tools with Artificial Intelligence. CA, 1997, 558-567.