Academia.eduAcademia.edu

Outline

Seeing beyond reading: a survey on visual text analytics

2012, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

https://doi.org/10.1002/WIDM.1071

Abstract

We review recent visualization techniques aimed at supporting tasks that require the analysis of text documents, from approaches targeted at visually summarizing the relevant content of a single document to those aimed at assisting exploratory investigation of whole collections of documents. Techniques are organized considering their target input material -either single texts or collections of texts -and their focus, which may be at displaying content, emphasizing relevant relationships, highlighting the temporal evolution of a document or collection, or helping users to handle results from a query posed to a search engine. We describe the approaches adopted by distinct techniques and briefly review the strategies they employ to obtain meaningful text models, how they extract the information required to produce representative visualizations, the tasks they intend to support and the interaction issues involved, as well as strengths and limitations. Finally, we show a summary of techniques, highlighting their goals and distinguishing characteristics. We also briefly discuss some open problems and research directions in the fields of visual text mining and text analytics.

References (59)

  1. Salton G, Wong A, and Yang CS. A vector space model for automatic indexing. ACM Communications, 18(11):613-620, 1975.
  2. Steinbock D. Tag Crowd home page. http://tagcrowd.com/, 2011.
  3. Viegas FB, Wattenberg M, and Feinberg J. Participatory visualization with wor- dle. IEEE Transactions on Visualization and Computer Graphics, 15(6):1137 -1144, 2009.
  4. Feinberg J. Wordle home page. http://www.wordle.net/, 2011.
  5. Seifert C, Kump B, Kienreich W, Granitzer G, and Granitzer M. On the beauty and usability of tag clouds. In International Conference Information Visualisa- tion, pages 17-25, Washington, DC, USA, 2008. IEEE Computer Society.
  6. Kyle K, Bongshin L, Bohyoung K, and Jinwook S. ManiWordle: Providing flex- ible control over wordle. IEEE Transactions on Visualization and Computer Graphics, 16:1190-1197, 2010.
  7. Hassan-Montero Y and Herrero-Solana V. Improving tag-clouds as visual in- formation retrieval interfaces. In International Conference on Multidisciplinary Information Sciences and Technologies, 2006.
  8. Keim DA and Oelke D. Literature fingerprinting: A new method for visual literary analysis. In IEEE Symposium on Visual Analytics Science and Technology, pages 115-122, Washington, DC, USA, 2007. IEEE Computer Society.
  9. Wattenberg M and Viégas FB. The Word Tree, an interactive visual concor- dance. IEEE Transactions on Visualization and Computer Graphics, 14:1221- 1228, 2008.
  10. Collins C, Carpendale S, and Penn G. DocuBurst: Visualizing document con- tent using language structure. Computer Graphics Forum, 28(3):1039-1046, 2009.
  11. van Ham F, Wattenberg M, and Viegas FB. Mapping text with Phrase Nets. IEEE Transactions on Visualization and Computer Graphics, 15:1169-1176, 2009.
  12. Rusu D, Fortuna B, Mladenic D, Grobelnik M, and Sipos R. Document visu- alization based on semantic graphs. In International Conference Information Visualisation, pages 292-297. IEEE Computer Society, 2009.
  13. Miller NE, Chung Wong P, Brewster M, and Foote H. Topic Islands -a wavelet- based text visualization system. In IEEE Conference on Visualization, pages 189-196, Los Alamitos, CA, USA, 1998. IEEE Computer Society.
  14. Mao Y, Dillon J, and Lebanon G. Sequential document visualization. IEEE Transactions on Visualization and Computer Graphics, 13:1208-1215, 2007.
  15. Viégas FB, Wattenberg M, and Dave K. Studying cooperation and conflict between authors with history flow visualizations. In Conference on Human factors in Computing Systems, pages 575-582, New York, NY, USA, 2004. ACM.
  16. Becks A. Benefits of document maps for text access in knowledge manage- ment: A comparative study. In Proceedings of the ACM Symposium on Applied Computing, pages 621-626. ACM, 2002.
  17. Skupin A. A cartographic approach to visualizing conference abstracts. IEEE Computer Graphics and Applications, 22:50-58, 2002.
  18. Wise JA. The ecological approach to text visualization. Journal of the American Society for Information Science, 50:1224-1233, November 1999.
  19. PNNL. IN-SPIRE T M Visual document analysis. Pacific Northwest Na- tional Laboratory (PNNL). http://in-spire.pnl.gov/ (accessed em 10/10/2011), 2011.
  20. Paulovich FV, Nonato LG, Minghim R, and Levkowitz H. Least square projec- tion: A fast high-precision multidimensional projection technique and its appli- cation to document mapping. IEEE Transactions on Visualization and Com- puter Graphics, 14:564-575, 2008.
  21. Eler DM, Paulovich FV, de Oliveira MCF, and Minghim R. Topic-based coor- dination for visual analysis of evolving document collections. In International Conference on Information Visualisation, pages 149-155. IEEE Computer So- ciety, july 2009.
  22. Lopes AA, Pinho R, Paulovich FV, and Minghim R. Visual text mining using association rules. Computer Graphics, 31:316-326, 2007.
  23. Andrews K, Kienreich W, Sabol V, Becker J, Droschl G, Kappe F, Granitzer M, Auer P, and Tochtermann K. The InfoSky visual explorer: exploiting hierarchi- cal structure and document similarities. Information Visualization, 1:166-181, 2002.
  24. Paulovich FV and R Minghim. HiPP: A novel hierarchical point placement strat- egy and its application to the exploration of document collections. IEEE Trans- actions on Visualization and Computer Graphics, 14(6):1229 -1236, 2008.
  25. Börner K, Chen C, and Boyack KW. Visualizing knowledge domains. Annual Review of Information Science and Technology, 37(1):179-255, 2003.
  26. Strobelt H, Oelke D, Rohrdantz C, Stoffel A, Keim DA, and Deussen O. Docu- ment Cards: A top trumps visualization for documents. IEEE Transactions on Visualization and Computer Graphics, 15:1145-1152, 2009.
  27. Lee B, Riche NH, Karlson AK, and Carpendale S. SparkClouds: Visualizing trends in tag clouds. IEEE Transactions on Visualization and Computer Graph- ics, 16(6):1182 -1189, 2010.
  28. Cui W, Wu Y, Liu S, Wei F, Zhou MX, and Qu H. Context-preserving, dynamic word cloud visualization. IEEE Computer Graphics and Applications, 30(6):42- 53, 2010.
  29. Havre S, Hetzler E, Whitney P, and Nowell L. ThemeRiver: Visualizing thematic changes in large document collections. IEEE Transactions on Visualization and Computer Graphics, 8:9-20, 2002.
  30. Wei F, Liu S, Song Y, Pan S, Zhou MX, Qian W, Shi L, Tan L, and Zhang Q. TIARA: a visual exploratory text analytic system. In ACM International Conference on Knowledge Discovery and Data Mining, pages 153-162, New York, NY, USA, 2010. ACM.
  31. Blei DM, Ng AY, and Jordan MI. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993-1022, 2003.
  32. Cui W, Liu S, Tan L, Shi C, Song Y, Gao Z, Qu H, and Tong X. TextFlow: Towards better understanding of evolving topics in text. IEEE Transactions on Visualization and Computer Graphics, 17:2412-2421, 2011.
  33. Teh YW, Jordan MI, Beal MJ, and Blei DM. Hierarchical dirichlet processes. Journal of the American Statistical Association, 101(476):1566-1581, 2004.
  34. Luo D, Yang J, Krstajic M, Ribarsky William, and Keim DA. EventRiver: Visu- ally exploring text collections with temporal references. IEEE Transactions on Visualization and Computer Graphics, 18(1), 2012.
  35. Leydesdorff L and Schank T. Dynamic animations of journal maps: Indicators of structural changes and interdisciplinary developments. Journal of the American Society for Information Science and Technology, 59:1810-1818, 2008.
  36. Alencar AB, Paulovich FV, Börner K, and Oliveira MCF. Time-aware visual- ization of document collections. In ACM Symposium on Applied Computing -Multimedia and Visualization Track, pages 997-1004, Riva del Garda, Italy, 2012. ACM.
  37. Pinho R de, Oliveira MCF, and Lopes AA. An incremental space to visualize dynamic data sets. Multimedia Tools and Applications, 50(3):533-562, 2010.
  38. Alsakran J, Chen Y, Luo D, Zhao Y, Yang J, Dou W, and Liu S. Real-time visu- alization of streaming text with a force-based dynamic system. IEEE Computer Graphics and Applications, 32(1):34-45, 2012.
  39. Herr BW, Duhon RJ, Börner K, Hardy EF, and Penumarthy S. 113 years of phys- ical review: Using flow maps to show temporal and topical citation patterns. In International Conference on Information Visualisation, pages 421-426, Los Alamitos, CA, USA, 2008. IEEE Computer Society.
  40. Chen C. CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information Science and Technology, 57:359-377, 2006.
  41. Dunne C, Shneiderman B, Gove R, Klavans J, and Dorr B. Rapid understanding of scientific paper collections: Integrating statistics, text analytics, and visual- ization. JASIST: Journal of the American Society for Information Science and Technology, 2012.
  42. Perer A and Shneiderman B. Balancing systematic and flexible exploration of social networks. IEEE Transactions on Visualization and Computer Graphics, 12(5):693-700, 2006.
  43. JabRef Development Team. JabRef. JabRef Development Team, 2010.
  44. Cao N, Sun J, Lin Y-R, Gotz D, Liu S, and Qu H. FacetAtlas: Multifaceted visual- ization for rich text corpora. IEEE Transactions on Visualization and Computer Graphics, 16(6):1172-1181, 2010.
  45. Hearst MA. TileBars: Visualization of term distribution information in full text information access. In Conference on Human factors in Computing Systems, Denver, CO, 1995. ACM.
  46. Hearst MA. Multi-paragraph segmentation of expository text. In Proceedingsof the 32nd Meeting of the Association for Computational Linguistics, pages 9-16, Stroudsburg, PA, USA, 1994. Association for Computational Linguistics.
  47. Heimonen T and Jhaveri N. Visualizing query occurrence in search result lists. In International Conference on Information Visualisation, pages 877-882. IEEE Computer Society, 2005.
  48. Hoeber O and Yang XD. The visual exploration of web search results using HotMap. In International Conference on Information Visualization, pages 157- 165. IEEE Computer Society, 2006.
  49. Hoeber O and Yang XD. Interactive web information retrieval using wordbars. In ACM Conference on Web Inteligence. ACM, 2006.
  50. Kuo BY-L, Hentrich T, Good BM, and Wilkinson MD. Tag clouds for summarizing web search results. In International Conference on World Wide Web, pages 1203-1204. ACM, 2007.
  51. Lam H and Baudisch P. Summary thumbnails: readable overviews for small screen web browsers. In Conference on Human Factors in Computing Sys- tems, pages 681-690. ACM, 2005.
  52. Li Z, Shi S, and Zhang L. Improving relevance judgment of web search results with image excerpts. In International Conference on World Wide Web, pages 21-30. ACM, 2008.
  53. Teevan J, Cutrell E, Fisher D, Drucker SM, Ramos G, Andre P, and Hu C. Vi- sual snippets: summarizing web pages for search and revisitation. In Inter- national Conference on Human Factors in Computing Systems, pages 2023- 2032. ACM, 2009.
  54. Jiao B, Yang L, Xu J, and Wu F. Visual summarization of web pages. In ACM Conference on Research and Development in Information Retrieval, pages 499-506. ACM, 2010.
  55. Nguyen TN and Zhang J. A novel visualization model for web search results. IEEE Transactions on Visualization and Computer Graphics, 12(5):981 -988, 2006.
  56. Spoerri A. Rankspiral: Toward enhancing search results visualization. In In- ternational Conference Information Visualisation, pages 208-214. IEEE Com- puter Society, 2004.
  57. Nizamee MR and Shojib MA. Visualizing the web search results with web search visualization using scatter plot. In IEEE Symposium on Web Society, pages 5 -10. IEEE Computer Society, 2010.
  58. Jing Tao Yao, Orland Hoeber, and Xue Dong Yang. Supporting Web Search with Visualization, pages 183-214. Springer London, 2010.
  59. Hearst MA. Search User Interfaces. Cambridge University Press, 2009.