A middleware for developing parallel data mining implementations
2001
https://doi.org/10.1137/1.9781611972719.17Abstract
Abstract Data mining is an interdisciplinary field, having applications in diverse areas like bioinformatics, medical informatics, scientific data analysis, financial analysis, consumer profiling, etc. In each of these application domains, the amount of data available for analysis has exploded in recent years, making the scalability of data
References (41)
- Asmara Afework, Michael D. Beynon, Fabian Bustamante, Angelo Demarzo, Renato Ferreira, Robert Miller, Mark Silberman, Joel Saltz, Alan Sussman, and Hubert Tsang. Digital dynamic telepathology -the Virtual Microscope. In Proceedings of the 1998 AMIA Annual Fall Symposium. American Medical Informatics Association, November 1998.
- Gagan Agrawal, Renato Ferreira, Joel Saltz, and Ruoming Jin. High-level programming methodologies for data intensive computing. In Proceedings of the Fifth Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers, May 2000.
- Gagan Agrawal, Renato Ferriera, and Joel Saltz. Language extensions and compilation techniques for data intensive computations. In Proceedings of Workshop on Compilers for Parallel Computing, January 2000.
- R. Agrawal and J. Shafer. Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering, 8(6):962 { 969, June 1996.
- S. Anand. Designing a kernel for data mining. IEEE Expert, pages 65{74, March 1997.
- R. Baraglia, D. Laforenza, S. Orlando, P. Palmerini, and R. Perego. Implementation issues in the design of i/o intensive data mining applications on clusters of workstations. In Proceedings of Workshop on High Performance Data Mining IPDPS 2000, LNCS Volume 1800, pages 350 { 357. Springer Verlag, 2000.
- P. Becuzzi, M. Coppola, and M. Vanneschi. Mining of association rules in very large databases: A structured parallel approach. In Proceedings of Europar-99, Lecture Notes in Computer Science (LNCS) Volume 1685, pages 1441 { 1450. Springer Verlag, August 1999.
- C. Chang, A. Acharya, A. Sussman, and J. Saltz. T2: A customizable parallel database for multi-dimensional data. ACM SIGMOD Record, 27(1):58{66, March 1998.
- Chialin Chang, Renato Ferreira, Alan Sussman, and Joel Saltz. Infrastructure for building parallel database systems for multi-dimensional data. In Proceedings of the Second Merged IPPS/SPDP (13th International Parallel Processing Symposium & 10th Symposium on Parallel and Distributed Processing). IEEE Computer Society Press, April 1999.
- Chialin Chang, Tahsin Kurc, Alan Sussman, and Joel Saltz. Query planning for range queries with user-de ned ag- gregation on multi-dimensional scienti c datasets. Technical Report CS-TR-3996 and UMIACS-TR-99-15, University of Maryland, Department of Computer Science and UMIACS, February 1999.
- Chialin Chang, Bongki Moon, Anurag Acharya, Carter Shock, Alan Sussman, and Joel Saltz. Titan: A high per- formance remote-sensing database. In Proceedings of the 1997 International Conference on Data Engineering, pages 375{384. IEEE Computer Society Press, April 1997.
- Chialin Chang, Alan Sussman, and Joel Saltz. Scheduling in a high performance remote-sensing data server. In Proceedings of the Eighth SIAM Conference on Parallel Processing for Scienti c Computing. SIAM, March 1997.
- Jaturon Chattratichat, John Darlington, Moustafa Ghanem, Yike Guo, Harald Huning, Martin Kohler, Janjao Suti- waraphun, Hing Wing To, and Dan Yang. Large scale data mining: The challenges and the solutions. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), August 1997.
- P. Cheeseman and J. Stutz. Bayesian classi cation (autoclass): Theory and practice. In Advanced in Knowledge Discovery and Data Mining, pages 61 { 83. AAAI Press / MIT Press, 1996.
- Peter F. Corbett and Dror G. Feitelson. The Vesta parallel le system. ACM Transactions on Computer Systems, 14(3):225{264, August 1996.
- John Darlington, Moustafa M. Ghanem, Yike Guo, and H. W. To. Performance models for co-ordinating parallel data classi cation. In Proceedings of the Seventh International Parallel Computing Workshop (PCW-97), Canberra, Australia, September 1997.
- Inderjit S. Dhillon and Dharmendra S. Modha. A data-clustering algorithm on distributed memory multiprocessors. In In Proceedings of Workshop on Large-Scale Parallel KDD Systems, in conjunction with the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 99), pages 47 { 56, August 1999.
- R. Ferreira, B. Moon, J. Humphries, A. Sussman, J. Saltz, R. Miller, and A. Demarzo. The Virtual Microscope. In Proceedings of the 1997 AMIA Annual Fall Symposium, pages 449{453. American Medical Informatics Association, Hanley and Belfus, Inc., October 1997.
- Renato Ferriera, Gagan Agrawal, and Joel Saltz. Compiling object-oriented data intensive computations. In Proceed- ings of the 2000 International Conference on Supercomputing, May 2000.
- D. Foti, D. Lipari, C. Pizzutti, and D. Talia. Scalable parallel clustering for data mining on multicomputers. In Proceedings of the Workshop on High Performance Data Mining, IPDPS 2000, LNCS Volume 1800, pages 390 { 398. Springer Verlag, May 2000.
- A. Freitas and S. Lavington. Mining very large databases with parallel processing. Kluwer Academic Publishers, 1998.
- E-H. Han, G. Karypis, and V. Kumar. Scalable parallel datamining for association rules. In Proceedings of ACM SIGMOD 1997, May 1997.
- E-H. Han, G. Karypis, and V. Kumar. Scalable parallel datamining for association rules. IEEE Transactions on Data and Knowledge Engineering, 12(3), May / June 2000.
- Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2000.
- A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
- Mahesh V. Joshi, Eui-Hong (Sam) Han, George Karypis, and Vipin Kumar. Parallel algorithms for data mining. In J. Dongarra, I. Foster, G. Fox, K. Kennedy, and A. White, editors, CRPC Parallel Computing Handbook. Morgan Kaufmann, 2000.
- Tahsin M. Kurc, Alan Sussman, and Joel Saltz. Coupling multiple simulations via a high performance customizable database system. In Proceedings of the Ninth SIAM Conference on Parallel Processing for Scienti c Computing. SIAM, March 1999.
- William A. Maniatty and Mohammed J. Zaki. A requirements analysis for parallel kdd systems. In Proceedings of Workshop on High Performance Data Mining, IPDPS 2000, LNCS Volume 1800, pages 358 { 365. IEEE Computer Society Press, May 2000.
- Srinivasan Parthasarathy, Mohammed Zaki, and Wei Li. Memory placement techniques for parallel association mining. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD), August 1998.
- Srinivasan Parthasarathy, Mohammed Zaki, Mitsunori Ogihara, and Wei Li. Parallel data mining for association rules on shared-memory systems. Knowledge and Information Systems, 2000. To appear.
- J. Pei, R. Mao, K. Hu, and H. Zhu. Towards data mining benchmarking: A test bed for performance study of frequent pattern mining. In Proceedings of 2000 ACM-SIGMOD Conference on Management of Data. ACM Press, May 2000.
- L. Rauchwerger and D.A. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Transactions on Parallel and Distributed Systems, 10(2):160{180, February 1999.
- Joel H. Saltz, Ravi Mirchandaney, and Kay Crowley. Run-time parallelization and scheduling of loops. IEEE Trans- actions on Computers, 40(5):603{612, May 1991.
- K. E. Seamons, Y. Chen, P. Jones, J. Jozwiak, and M. Winslett. Server-directed collective I/O in Panda. In Proceedings Supercomputing '95. IEEE Computer Society Press, December 1995.
- Carter T. Shock, Chialin Chang, Bongki Moon, Anurag Acharya, Larry Davis, Joel Saltz, and Alan Sussman. The design and evaluation of a high-performance earth science database. Parallel Computing, 24(1):65{90, January 1998.
- David B. Skillicorn. Strategies for parallel data mining. IEEE Concurrency, Oct-Dec 1999.
- D.B. Skillicorn. Strategies for parallelizing data mining. In Proceedings of the Workshop on High-Performance Data Mining, in association with IPPS/SPDP 1998, April 1998.
- Kilian Sto el and Abdelkader Belkoniene. Parallel k/h-means clustering for large datasets. In Proceedings of Europar- 99, Lecture Notes in Computer Science (LNCS) Volume 1685, pages 1451 { 1454. Spring Verlag, August 1999.
- R. Thakur, A. Choudhary, R. Bordawekar, S. More, and S. Kutipudi. Passion: Optimized I/O for parallel applications. IEEE Computer, 29(6):70{78, June 1996.
- Rajeev Thakur and Alok Choudhary. An extended two-phase method for accessing sections of out-of-core arrays. Scienti c Programming, 5(4):301{317, Winter 1996.
- Mohammed J. Zaki. Parallel and distributed association mining: A survey. IEEE Concurrency, 7(4):14 { 25, 1999.