jiawei han

Pre-shared secret key Public encryption key: fields are separately encrypted using the public key... more Pre-shared secret key Public encryption key: fields are separately encrypted using the public key Optimized public encryption key: used to encrypt a random symmetric key, and then data is encrypted using the symmetric key Public signature key: used only for signature purpose ⇒ 8 variants of IKE phase 1: 2 modes x 4 key types Proof of Identity:

Download

Discovery of Spatial Association Rules in Geographic Information Databases

Spatial data mining, i.e., discovery of interesting, implicit knowledge in spatial databases, is ... more Spatial data mining, i.e., discovery of interesting, implicit knowledge in spatial databases, is an important task for understanding and use of spatial data- and knowledge-bases. In this paper, an efficient method for mining strong spatial association rules in geographic information databases is proposed and studied. A spatial association rule is a rule indicating certain association relationship among a set of spatial and possibly some nonspatial predicates. A strong rule indicates that the patterns in the rule have relatively frequent occurrences in the database and strong implication relationships. Several optimization techniques are explored, including a two-step spatial computation technique (approximate computation on large sets, and refined computations on small promising patterns), shared processing in the derivation of large predicates at multiple concept levels, etc. Our analysis shows that interesting association rules can be discovered efficiently in large spatial databases.

Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

IEEE Transactions on Knowledge and Data Engineering, 2004

Sequential pattern mining is an important data mining problem with broad applications. However, i... more Sequential pattern mining is an important data mining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Most of the previously developed sequential pattern mining methods, such as GSP, explore a candidate generation-and-test approach [1] to reduce the number of candidates to be examined. However, this approach may not be efficient in mining large sequence databases having numerous patterns and/or long patterns. In this paper, we propose a projection-based, sequential pattern-growth approach for efficient mining of sequential patterns. In this approach, a sequence database is recursively projected into a set of smaller projected databases, and sequential patterns are grown in each projected database by exploring only locally frequent fragments. Based on an initial study of the pattern growth-based sequential pattern mining, FreeSpan [8], we propose a more efficient method, called PSP, which offers ordered growth and reduced projected databases. To further improve the performance, a pseudoprojection technique is developed in PrefixSpan. A comprehensive performance study shows that PrefixSpan, in most cases, outperforms the a priori-based algorithm GSP, FreeSpan, and SPADE [29] (a sequential pattern mining algorithm that adopts vertical data format), and PrefixSpan integrated with pseudoprojection is the fastest among all the tested algorithms. Furthermore, this mining methodology can be extended to mining sequential patterns with user-specified constraints. The high promise of the pattern-growth approach may lead to its further extension toward efficient mining of other kinds of frequent patterns, such as frequent substructures.

Download

Knowledge Discovery in Databases: An Attribute-Oriented Approach

Knowledge discovery in databases, or data mining, is an important issue in the development of dat... more Knowledge discovery in databases, or data mining, is an important issue in the development of data-and knowledge-base systems. An attribute-oriented induction method has been developed for knowledge discovery in databases. The method integrates a machine learning paradigm, especially learning-from-examples techniques, with set-oriented database operations and extracts generalized data from actual data in databases. An attribute-oriented concept tree ascension technique is applied in generalization, which substantially reduces the computational complex@ of database learning processes. Different kinas of knowledge rules, including characteristic rules, discrimination rules, quantitative rules, and data evolution regularities can be discovered efficiently using the attribute-oriented approach. In addition to learning in relational databases, the approach can be applied to knowledge discovery in nested relational and deductive databases. Learning can also be performed with databases containing noisy data and exceptional cases using database statistics. Furthermore, the rules discovered can be used to query database knowledge, answer cooperative queries and facilitate semantic query optimization. Based upon these principles, a prototyped database learning system, DBLEARN, has been constructed for experimentation.

Download

CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets

Abstract Association mining may often derive an undesirably large set of frequent itemsets and as... more

Data Mining: Concepts and Techniques

Our capabilities of both generating and collecting data have been increasing rapidly in the last ... more Our capabilities of both generating and collecting data have been increasing rapidly in the last several decades. Contributing factors include the widespread use of bar codes for most commercial products, the computerization of many business, scienti c and government transactions and managements, and advances in data collection tools ranging from scanned texture and image platforms, to on-line instrumentation in manufacturing and shopping, and to satellite remote sensing systems. In addition, popular use of the World Wide Web as a global information system has ooded us with a tremendous amount of data and information. This explosive growth in stored data has generated an urgent need for new techniques and automated tools that can intelligently assist us in transforming the vast amounts of data into useful information and knowledge.

Download

Metarule-Guided Mining of MultiDimensional Association Rules Using Data Cubes

In this paper, we employ a novel approach to metarule-guided, multi-dimensional association rule ... more

Download

A Fast Distributed Algorithm for Mining Association Rules

With the existence of many large transaction databases, the huge amounts of data, the high scalab... more With the existence of many large transaction databases, the huge amounts of data, the high scalability of distributed systems, and the easy partition and distribution of a centralized database, it is important to inuestzgate eficient methods for distributed mining of association rules. This study discloses some interesting relationships between locally large and globally large itemsets and proposes an interesting distributed association rule mining algorithm, FDM (Fast Distributed Mining of association rules), which generates a small number of candidate sets and substantially reduces the number of messages to be passed at mining association rules. Our performance study shows that FDM has a superior performance over the direct application of a typical sequential algorithm. Further performance enhancement leads to a few variations of the algorithm. 0-8186-7475-X/96 $5.00 0 1996 IEEE

Download

Dynamic Generation and Refinement of Concept Hierarchies for Knowledge Discovery in Databases

Abstract Concept hierarchies organize data and concepts in hierarchical ]orms or in certain parti... more

Discovery of Multiple-Level Association Rules from Large Databases

Previous studies on mining association rules find rules at single concept level, however, mining ... more

Uploads

Papers by jiawei han

Log In