Academia.eduAcademia.edu

Outline

A Matrix Approach for Association Mining

2001

Abstract

Association Mining, a class of data mining techniques, is one of the most researched field in data mining, where algorithms are designed to discover rules that reflect dependencies among values of an attribute. Because of the vast amounts of data that businesses store, most association mining algorithms are computationally expensive, where many passes over data are performed. Besides working on the sequential processing environment, the implementation of data mining ideas should consider parallel computing environments. In this paper, a new technique is presented to perform association mining based on the matrix approach. The new technique can be applied on the sequential and parallel environments. In the proposed technique, the data records are only scanned once to construct a frequency vector and a binary association matrix. Two algorithms, one for generating only maximal large item-sets and the other for generating all large item-sets, are presented. The number of disk accesses, CPU time, and memory space needed for generating large item-sets are O(n), O(N 2) , and O(N), respectively, where n is the number of input transactions, and N is the number of transaction groups.

References (11)

  1. R. Agrawal, T. Imilienski, and A. Swami, "Mining Association Rules between Sets of Items in Large Databases," Proc. of the ACM SIGMOD Int'l Conf. On Management of data, May 1993.
  2. R. Agrawal, and R. Srikant, "Fast Algorithms for Mining Association Rules," Proc. Of the 20 th VLDB Conference, Santiago, Chile, 1994.
  3. R. Agrawal, J. Shafer, "Parallel Mining of Association Rules," IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, Dec. 1996.
  4. C. Agrawal, and P. Yu, "Mining Large Itemsets for Association Rules," Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 1997.
  5. S. Brin, R. Motwani, J. Ullman, and S. Tsur, "Dynamic Itemset Counting and Implication Rules for Market Basket Data," SIGMOD Record (SCM Special Interset Group on Management of Data), 26,2, 1997.
  6. S. Chaudhuri, "Data Mining and Database Systems: Where is the Intersection," Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 1997.
  7. A. Freitas and S. Lavington, "Mining very large databases with parallel processing," Kluwer Academic Pub., 1998.
  8. H. Kargupta and P. Chan, editors, Advances in distributed data mining, AAAI Press, 2000.
  9. H. Mannila, H. Toivonen, and A. Verkamo, "Efficient Algorithms for Discovering Association Rules," AAAI Workshop on Knowledge Discovery in databases (KDD-94) , July 1994.
  10. M. Zaki, "Parallel and Distributed Association Mining: A Survey, " IEEE Concurrency, 7(4), pp. 14- 25, 1999.
  11. M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, " New Algorithms for Fast Discovery of Association Rules," Proc. Of the 3 rd Int'l Conf. On Knowledge Discovery and data Mining (KDD-97), AAAI Press, 1997.