Academia.eduAcademia.edu

Outline

Visual Data-Mining Techniques

2005, Visualization Handbook

https://doi.org/10.1016/B978-012387582-2/50045-9

Abstract

Never before in history has data been generated at such high volumes as it is today. Exploring and analyzing the vast volumes of data has become increasingly difficult. Information visualization and visual data mining can help to deal with the flood of information. The advantage of visual data exploration is that the user is directly involved in the data mining process. There are a large number of information visualization techniques that have been developed over the last two decades to support the exploration of large data sets. In this paper, we propose a classification of information visualization and visual data mining techniques based on the data type to be visualized, the visualization technique, and the interaction technique. We illustrate the classification using a few examples, and indicate some directions for future work.

Key takeaways
sparkles

AI

  1. Visual data exploration integrates human insight into data analysis, enhancing hypothesis generation and discovery.
  2. An estimated 1 Exabyte of data is generated annually, necessitating advanced visualization techniques for effective exploration.
  3. The three-step visual exploration process includes overview, zoom and filter, and details-on-demand for efficient analysis.
  4. Classification of visual data techniques involves data type, visualization technique, and interaction technique, all being orthogonal.
  5. Future work should integrate visualization with traditional data analysis methods to improve the mining process.

References (86)

  1. J. Abello and J. Korn. Mgv: A system for visualizing massive multi-digraphs. Transactions on Visualization and Computer Graphics, 2001.
  2. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo. Fast discovery of association rules. Advances in Knowledge Discovery and Data Mining, pages 307-328, 1996.
  3. C. Ahlberg and B. Shneiderman. Visual information seeking: Tight coupling of dynamic query filters with starfield displays. In Proc. Human Factors in Computing Systems CHI '94 Conf., Boston, MA, pages 313-317, 1994.
  4. C. Ahlberg and E. Wistrand. Ivee: An information visualization and exploration environment. In Proc. Int. Symp. on Information Visualization, Atlanta, GA, pages 66-73, 1995.
  5. B. Alpern and L. Carter. Hyperbox. In Proc. Visualization '91, San Diego, CA, pages 133-139, 1991.
  6. D. F. Andrews. Plots of high-dimensional data. Biometrics, 29:125-136, 1972.
  7. M. Ankerst, M. Breunig, H. Kriegel, and J.Sander. OPTICS: Ordering Points To Identify the Clustering Structure. Proc. ACM SIGMOD'99, Int. Conf on Manage- ment of Data, Philadelphia, PA, pages 49-60, 1999.
  8. M. Ankerst, M. Ester, and H. Kriegel. Towards an effective cooperation of the com- puter and the user for classification. SIGKDD Int. Conf. On Knowledge Discovery & Data Mining (KDD 2000), Boston, MA, pages 179-188, 2000.
  9. M. Ankerst, D. A. Keim, and H.-P. Kriegel. Circle segments: A technique for visually exploring large multidimensional data sets. In Proc. Visualization 96, Hot Topic Session, San Francisco, CA, 1996.
  10. V. Anupam, S. Dar, T. Leibfried, and E. Petajan. Dataspace: 3D visualization of large databases. In Proc. Int. Symp. on Information Visualization, Atlanta, GA, pages 82-88, 1995.
  11. D. Asimov. The grand tour: A tool for viewing multidimensional data. SIAM Journal of Science & Stat. Comp., 6:128-143, 1985.
  12. G. D. Battista, P. Eades, R. Tamassia, and I. G. Tollis. Graph Drawing. Prentice Hall, 1999.
  13. R. Becker, J. M. Chambers, and A. R. Wilks. The New S Language. Wadsworth & Brooks/Cole Advanced Books and Software, Pacific Grove, CA, 1988.
  14. R. A. Becker, W. S. Cleveland, and M.-J. Shyu. The visual design and control of trellis display. Journal of Computational and Graphical Statistics, 5(2):123-155, 1996.
  15. B. Bederson. Pad++: Advances in multiscale interfaces. In Proc. Human Factors in Computing Systems CHI '94 Conf., Boston, MA, page 315, 1994.
  16. B. B. Bederson and J. D. Hollan. Pad++: A zooming graphical interface for exploring alternate interface physics. In Proc. UIST, pages 17-26, 1994.
  17. E. A. Bier, M. C. Stone, K. Pier, W. Buxton, and T. DeRose. Toolglass and magic lenses: The see-through interface. In Proc. SIGGRAPH '93, Anaheim, CA, pages 73-80, 1993.
  18. H. H. Bock. Automatic Classification. Vandenhoeck and Ruprecht, Göttingen, 1974.
  19. L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA, 1984.
  20. A. Buja, D. F. Swayne, and D. Cook. Interactive high-dimensional data visualiza- tion. Journal of Computational and Graphical Statistics, 5(1):78-99, 1996.
  21. S. Card, J. Mackinlay, and B. Shneiderman. Readings in Information Visualization. Morgan Kaufmann, 1999.
  22. M. S. T. Carpendale, D. J. Cowperthwaite, and F. D. Fracchia. Ieee computer graphics and applications, special issue on information visualization. IEEE Journal Press, 17(4):42-51, July 1997.
  23. D. B. Carr, E. J. Wegman, and Q. Luo. Explorn: Design considerations past and present. In Technical Report, No. 129, Center for Computational Statistics, George Mason University, 1996.
  24. C. Chen. Information Visualisation and Virtual Environments. Springer-Verlag, London, 1999.
  25. H. Chernoff. The use of faces to represent points in k-dimensional space graphically. Journal Amer. Statistical Association, 68:361-368, 1973.
  26. W. S. Cleveland. Visualizing Data. AT&T Bell Laboratories, Murray Hill, NJ, Hobart Press, Summit NJ, 1993.
  27. M. Dodge. Web visualization. http://www.geog.ucl.ac.uk/ casa/martin/geography of cyberspace.html, Oct 2001.
  28. S. G. Eick. Data visualization sliders. In Proc. ACM UIST, pages 119-120, 1994.
  29. S. Feiner and C. Beshers. Visualizing n-dimensional virtual worlds with n-vision. Computer Graphics, 24(2):37-38, 1990.
  30. K. Fishkin and M. C. Stone. Enhanced dynamic queries via movable filters. In Proc. Human Factors in Computing Systems CHI '95 Conf., Denver, CO, pages 415-420, 1995.
  31. G. Furnas. Generalized fisheye views. In Proc. Human Factors in Computing Systems CHI 86 Conf., Boston, MA, pages 18-23, 1986.
  32. G. W. Furnas and A. Buja. Prosections views: Dimensional inference through sec- tions and projections. Journal of Computational and Graphical Statistics, 3(4):323- 353, 1994.
  33. J. Goldstein and S. F. Roth. Using aggregation and dynamic queries for exploring large data sets. In Proc. Human Factors in Computing Systems CHI '94 Conf., Boston, MA, pages 23-29, 1994.
  34. M. Hao, M. Hsu, U. Dayal, S. F. Wei, T. Sprenger, and T. Holenstein. Market basket analysis visualization on a spherical surface. Visual Data Exploration and Analysis Conference, San Jose, CA, 2001.
  35. S. Havre, B. Hetzler, L. Nowell, and P. Whitney. Themeriver: Visualizing the- matic changes in large document collections. Transactions on Visualization and Computer Graphics, 2001.
  36. M. Hearst. Tilebars: Visualization of term distribution information in full text information access. In Proc. of ACM Human Factors in Computing Systems Conf. (CHI'95), pages 59-66, 1995.
  37. A. Hinneburg, D. Keim, and M. Wawryniuk. HD-Eye: Visual Mining of High- Dimensional Data. IEEE Computer Graphics and Applications, 19(5), 1999.
  38. H. Hofmann, A. Siebes, and A. Wilhelm. Visualizing association rules with inter- active mosaic plots. SIGKDD Int. Conf. On Knowledge Discovery & Data Mining (KDD 2000), Boston, MA, 2000.
  39. P. J. Huber. The annals of statistics. Projection Pursuit, 13(2):435-474, 1985.
  40. D. T. Inc. Dbminer. http://www.dbminer.com, 2001.
  41. S. G. Inc. Mineset. http://www.sgi.com/software/mineset, 2001.
  42. A. Inselberg and B. Dimsdale. Parallel coordinates: A tool for visualizing multi- dimensional geometry. In Proc. Visualization 90, San Francisco, CA, pages 361- 370, 1990.
  43. B. Johnson and B. Shneiderman. Treemaps: A space-filling approach to the visual- ization of hierarchical information. In Proc. Visualization '91 Conf, pages 284-291, 1991.
  44. D. Keim. Designing pixel-oriented visualization techniques: Theory and applica- tions. Transactions on Visualization and Computer Graphics, 6(1):59-78, Jan-Mar 2000.
  45. D. Keim. Visual exploration of large databases. Communications of the ACM, 44(8):38-44, 2001.
  46. D. A. Keim and H.-P. Kriegel. Visdb: Database exploration using multidimensional visualization. Computer Graphics & Applications, 6:40-49, Sept. 1994.
  47. D. A. Keim, H.-P. Kriegel, and M. Ankerst. Recursive pattern: A technique for visualizing very large amounts of data. In Proc. Visualization 95, Atlanta, GA, pages 279-286, 1995.
  48. J. Lamping, R. R., and P. Pirolli. A focus + context technique based on hyperbolic geometry for visualizing large hierarchies. In Proc. Human Factors in Computing Systems CHI 95 Conf., pages 401-408, 1995.
  49. J. LeBlanc, M. O. Ward, and N. Wittels. Exploring n-dimensional databases. In Proc. Visualization '90, San Francisco, CA, pages 230-239, 1990.
  50. Y. Leung and M. Apperley. A review and taxonomy of distortion-oriented presen- tation techniques. In Proc. Human Factors in Computing Systems CHI '94 Conf., Boston, MA, pages 126-160, 1994.
  51. H. Levkowitz. Color icons: Merging color and texture perception for integrated visualization of multiple parameters. In Proc. Visualization 91, San Diego, CA, pages 22-25, 1991.
  52. N. L. M. Kreuseler and H. Schumann. A scalable framework for information visu- alization. Transactions on Visualization and Computer Graphics, 2001.
  53. J. D. Mackinlay, G. G. Robertson, and S. K. Card. The perspective wall: Detail and context smoothly integrated. In Proc. Human Factors in Computing Systems CHI '91 Conf., New Orleans, LA, pages 173-179, 1991.
  54. M. Mehta, R. Agrawal, and J. Rissanen. SLIQ: A fast scalable classifier for data mining. Conf. on Extending Database Technology (EDBT), Avignon, France, 1996.
  55. T. Munzner and P. Burchard. Visualizing the structure of the world wide web in 3D hyperbolic space. In Proc. VRML '95 Symp, San Diego, CA, pages 33-38, 1995.
  56. K. Perlin and D. Fox. Pad: An alternative approach to the computer interface. In Proc. SIGGRAPH, Anaheim, CA, pages 57-64, 1993.
  57. R. M. Pickett and G. G. Grinstein. Iconographic displays for visualizing multi- dimensional data. In Proc. IEEE Conf. on Systems, Man and Cybernetics, IEEE Press, Piscataway, NJ, pages 514-519, 1988.
  58. J. R. Quinlan. Induction of decision trees. Machine Learning, pages 81-106, 1986.
  59. J. R. Quinlan. C4.5: Programs For Machine Learning. Morgan Kaufmann, Los Altos, CA, 1993.
  60. R. Rao and S. K. Card. The table lens: Merging graphical and symbolic repre- sentation in an interactive focus+context visualization for tabular information. In Proc. Human Factors in Computing Systems CHI 94 Conf., Boston, MA, pages 318-322, 1994.
  61. G. G. Robertson, J. D. Mackinlay, and S. K. Card. Cone trees: Animated 3D visualizations of hierarchical information. In Proc. Human Factors in Computing Systems CHI 91 Conf., New Orleans, LA, pages 189-194, 1991.
  62. M. Sarkar and M. Brown. Graphical fisheye views. Communications of the ACM, 37(12):73-84, 1994.
  63. Schaffer, Doug, Zuo, Zhengping, Bartram, Lyn, Dill, John, Dubs, Shelli, Green- berg, Saul, and Roseman. Comparing fisheye and full-zoom techniques for navi- gation of hierarchically clustered networks. In Proc. Graphics Interface (GI '93), Toronto, Ontario, 1993, in: Canadian Information Processing Soc., Toronto, On- tario, Graphics Press, Cheshire, CT, pages 87-96, 1993.
  64. H. Schumann and W. Müller. Visualisierung: Grundlagen und allgemeine Metho- den. Springer, 2000.
  65. D. W. Scott. Multivariate Density Estimation. Wiley and Sons, 1992.
  66. J. Shafer, R. Agrawal, and M. Mehta. SPRINT: A scalable parallel classifier for data mining. Conf. on Very Large Databases, 1996.
  67. B. Shneiderman. Tree visualization with treemaps: A 2D space-filling approach. ACM Transactions on Graphics, 11(1):92-99, 1992.
  68. B. Shneiderman. The eye have it: A task by data type taxonomy for information visualizations. In Visual Languages, 1996.
  69. B. Spence. Information Visualization. Pearson Education Higher Education pub- lishers, UK, 2000.
  70. R. Spence and M. Apperley. Data base navigation: An office environment for the professional. Behaviour and Information Technology, 1(1):43-54, 1982.
  71. R. Spence, L. Tweedie, H. Dawkes, and H. Su. Visualization for functional design. In Proc. Int. Symp. on Information Visualization (InfoVis '95), pages 4-10, 1995.
  72. A. Spoerri. Infocrystal: A visual tool for information retrieval. In Proc. Visualiza- tion '93, San Jose, CA, pages 150-157, 1993.
  73. J. Stasko, J. Domingue, M. Brown, and B. Price. Software Visualization. MIT Press, Cambridge, MA, 1998.
  74. C. Stolte, D. Tang, and P. Hanrahan. Polaris: A system for query, analysis and visualization of multi-dimensional relational databases. Transactions on Visual- ization and Computer Graphics, 2001.
  75. D. F. Swayne, D. Cook, and A. Buja. User's Manual for XGobi: A Dynamic Graphics Program for Data Analysis. Bellcore Technical Memorandum, 1992.
  76. L. Tierney. LispStat: An Object-Orientated Environment for Statistical Computing and Dynamic Graphics. Wiley, New York, NY, 1991.
  77. J. Trilk. Software visualization. http://wwwbroy. informatik.tu- muenchen.de/˜trilk/sv.html, Oct 2001.
  78. P. E. Utgoff. Incremental induction of decision trees. Machine Learning, 4:161-186, 1989.
  79. P. E. Utgoff, N. C. Berkman, and J. A. Clouse. Decision tree induction based on efficient tree restructuring. Machine Learning, 29:5-44, 1997.
  80. J. J. van Wijk and R. D. van Liere. Hyperslice. In Proc. Visualization '93, San Jose, CA, pages 119-125, 1993.
  81. P. F. Velleman. Data Desk 4.2: Data Description. Data Desk, Ithaca, NY, 1992, 1992.
  82. M. O. Ward. Xmdvtool: Integrating multiple methods for visualizing multivariate data. In Proc. Visualization 94, Washington, DC, pages 326-336, 1994.
  83. C. Ware. Information Visualization: Perception for Design. Morgen Kaufman, 2000.
  84. A. Wilhelm, A. Unwin, and M. Theus. Software for interactive statistical graphics -a review. In Proc. Int. Softstat 95 Conf., Heidelberg, Germany, 1995.
  85. J. A. Wise, J. J. Thomas, K. Pennock, D. Lantrip, M. Pottier, S. A., and V. Crow. Visualizing the non-visual: Spatial analysis and interaction with information from text documents. In Proc. Symp. on Information Visualization, Atlanta, GA, pages 51-58, 1995.
  86. L. Yan. Interactive exploration of very large relational data sets through 3d dy- namic projections. SIGKDD Int. Conf. On Knowledge Discovery & Data Mining (KDD 2000), Boston, MA, 2000.