Academia.eduAcademia.edu

Outline

Interactive Analysis of Word Vector Embeddings

2018, Computer Graphics Forum

https://doi.org/10.1111/CGF.13417

Abstract

Word vector embeddings are an emerging tool for natural language processing. They have proven beneficial for a wide variety of language processing tasks. Their utility stems from the ability to encode word relationships within the vector space. Applications range from components in natural language processing systems to tools for linguistic analysis in the study of language and literature. In many of these applications, interpreting embeddings and understanding the encoded grammatical and semantic relations between words is useful, but challenging. Visualization can aid in such interpretation of embeddings. In this paper, we examine the role for visualization in working with word vector embeddings. We provide a literature survey to catalogue the range of tasks where the embeddings are employed across a broad range of applications. Based on this survey, we identify key tasks and their characteristics. Then, we present visual interactive designs that address many of these tasks. The designs integrate into an exploration and analysis environment for embeddings. Finally, we provide example use cases for them and discuss domain user feedback.

References (62)

  1. challenging implementation aspect was to ensure interac- tive response times for the lens ( §4.3). To retrieve and summarize https://graphics.cs.wisc.edu/Vis/EmbVis 1860 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 References [ABHR * ] ALPER B., BACH B., HENRY RICHE N., ISENBERG T., FEKETE J.-D.: Weighted graph comparison techniques for brain con- nectivity analysis. 8
  2. ALEXANDER E., GLEICHER M.: Task-driven comparison of topic models. IEEE Transactions on Visualization and Computer Graph- ics 22, 1 (January 2016), 320-329. 2, 6
  3. AKV * 14] ALEXANDER E., KOHLMANN J., VALENZA R., WITMORE M., GLEICHER M.: Serendip: Topic model-driven visual exploration of text corpora. In IEEE Conference on Visual Analytics Science and Technology (2014), IEEE, pp. 173-182. 2
  4. ALL * 16] ARORA S., LI Y., LIANG Y., MA T., RISTESKI A.: A latent variable model approach to pmi-based word embeddings. Transactions of the Association for Computational Linguistics 4 (2016), 385-399. 2
  5. ARORA S., LIANG Y., MA T.: A simple but tough-to-beat baseline for sentence embeddings. In Proc. of ICLR (2017). 2
  6. BCZ * 16] BOLUKBASI T., CHANG K.-W., ZOU J. Y., SALIGRAMA V., KALAI A. T.: Man is to computer programmer as woman is to home- maker? debiasing word embeddings. In Neural Information Processing (2016), pp. 4349-4357. 3, 4
  7. BARONI M., DINU G., KRUSZEWSKI G.: Don't count, pre- dict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In Proc. of ACL (2014), pp. 238-247. 3, 4 [BGH * 17] BOWMAN S. R., GOLDBERG Y., HILL F., LAZARIDOU A., LEVY O., REICHART R., SØGAARD A. (Eds.):. Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP, RepE- val@EMNLP 2017, Copenhagen, Denmark, September 8, 2017 (2017), Association for Computational Linguistics. 1
  8. BERGER M., MCDONOUGH K., SEVERSKY L. M.: cite2vec: Citation-driven document exploration via word embeddings. IEEE Transactions on Visualization and Computer Graphics 23, 1 (Jan 2017), 691-700. 3
  9. BERTINI E., RIGAMONTI M., LALANNE D.: Extended excentric labeling. In EuroVIS'09. 3, 8
  10. BSP * 93] BIER E. A., STONE M. C., PIER K., BUXTON W., DEROSE T. D.: Toolglass and magic lenses: The see-through interface. In Proc. of the 20th Annual Conference on Computer Graphics and Interac- tive Techniques (New York, NY, USA, 1993), SIGGRAPH '93, ACM, pp. 73-80. 8
  11. COLLINS C., CARPENDALE S., PENN G.: Docuburst: Visual- izing document content using language structure. Computer Graphics Forum 28, 3 (2009), 1039-1046. 3
  12. CHOO J., LEE C., REDDY C. K., PARK H.: Utopian: User- driven topic modeling based on interactive nonnegative matrix factoriza- tion. IEEE Transactions on Visualization and Computer Graphics 19, 12 (Dec 2013), 1992-2001. 2
  13. COLLINS C., VIEGAS F. B., WATTENBERG M.: Parallel tag clouds to explore and analyze faceted text corpora. In Visual Analytics Science and Technology, 2009. VAST 2009. IEEE Symposium on (2009), IEEE, pp. 91-98. 6
  14. FAST E., CHEN B., BERNSTEIN M. S.: Empath: Understand- ing topic signals in large-scale text. In Proc. of CHI (2016), pp. 4647- 4657. 1, 2
  15. FDJ * 14] FARUQUI M., DODGE J., JAUHAR S. K., DYER C., HOVY E., SMITH N. A.: Retrofitting word vectors to semantic lexicons. arXiv preprint arXiv:1411.4166 (2014). 3, 4
  16. FEKETE J.-D., PLAISANT C.: Excentric labeling: Dynamic neighborhood labeling for data visualization. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (New York, NY, USA, 1999), CHI '99, ACM, pp. 512-519. 3, 8
  17. FULDA N., RICKS D., MURDOCH B., WINGATE D.: What can you do with a rock? affordance extraction via word embeddings. arXiv preprint arXiv:1703.03429 (2017). 3
  18. GLADKOVA A., DROZD A.: Intrinsic evaluations of word em- beddings: What can we do better? In Proc. of the 1st Workshop on Eval- uating Vector-Space Representations for NLP (2016), pp. 36-42. 5
  19. GLEICHER M.: Explainers: Expert explorations with crafted pro- jections. IEEE Transactions on Visualization and Computer Graphics 19, 12 (Dec 2013), 2042-2051. 2, 8
  20. GLEICHER M.: A framework for considering comprehensibility in modeling. Big data 4, 2 (2016), 75-88. 2
  21. HELLRICH J., HAHN U.: Bad company-neighborhoods in neural embedding spaces considered harmful. In COLING (2016), pp. 2785- 2796. 10
  22. HELLRICH J., HAHN U.: Don't get fooled by word embeddings: better watch their neighborhood. In Conference of the Alliance of Digital Humanities Organizations (Abstracts) (2017). 10
  23. HJH * 16] HEIMERL F., JOHN M., HAN Q., KOCH S., ERTL T.: Docu- compass: Effective exploration of document landscapes. In IEEE Con- ference on Visual Analytics Science and Technology (Oct 2016), pp. 11- 20. 3, 8
  24. HEIMERL F., KOCH S., BOSCH H., ERTL T.: Visual classi- fier training for text document retrieval. IEEE Transactions on Visual- ization and Computer Graphics 18, 12 (Dec 2012), 2839-2848. 2
  25. HAMILTON W. L., LESKOVEC J., JURAFSKY D.: Diachronic word embeddings reveal statistical laws of semantic change. In Proc. of ACL (2016), pp. 1489-1501. 1, 2, 3, 4, 9
  26. JATOWT A., DUH K.: A framework for analyzing semantic change of words across time. In Proc. of JCDL (2014), pp. 229-238. 8
  27. JEFFERS R. J., LEHISTE I.: Principles and Methods for Histori- cal Linguistics, 1 ed. The MIT Press, 1979. 9, 10
  28. KIM H., CHOO J., PARK H., ENDERT A.: Interaxis: Steering scatterplot axes via observation-level interaction. IEEE Transactions on Visualization and Computer Graphics 22, 1 (Jan 2016), 131-140. 2, 8
  29. KESSLER J. S.: Scattertext: a browser-based tool for visualizing how corpora differ. CoRR abs/1703.00565 (2017). 3
  30. KUSNER M., SUN Y., KOLKIN N., WEINBERGER K.: From word embeddings to document distances. In International Conference on Machine Learning (2015), pp. 957-966. 1
  31. LBT * 17] LIU S., BREMER P. T., THIAGARAJAN J. J., SRIKUMAR V., WANG B., LIVNAT Y., PASCUCCI V.: Visual exploration of semantic relationships in neural word embeddings. IEEE Transactions on Visual- ization and Computer Graphics PP, 99 (2017), 1-1. 1, 2, 5
  32. LEVY O., GOLDBERG Y.: Linguistic regularities in sparse and explicit word representations. In Proc. of CoNLL (2014), pp. 171-180. 1, 3 [LHK * 16] LEVY O., HILL F., KORHONEN A., CHO K., REICHART R., GOLDBERG Y., BORDES A. (Eds.):. Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, RepEval@ACL 2016, Berlin, Germany, August 2016 (2016), Association for Computational Linguistics. 1
  33. LJW * 15] LIU Q., JIANG H., WEI S., LING Z.-H., HU Y.: Learning semantic word embeddings based on ordinal knowledge constraints. In Association for Computational Linguistics (2015), pp. 1501-1511. 3, 4 [LSC * 17] LIU M., SHI J., CAO K., ZHU J., LIU S.: Analyzing the training processes of deep generative models. IEEE Transactions on Visualization and Computer Graphics (2017). 2
  34. LSL * 17] LIU M., SHI J., LI Z., LI C., ZHU J., LIU S.: Towards better analysis of deep convolutional neural networks. IEEE transactions on visualization and computer graphics 23, 1 (2017), 91-100. 2
  35. MIKOLOV T., CHEN K., CORRADO G., DEAN J.: Effi- cient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013). 1, 2, 4, 5
  36. MAATEN L. V. D., HINTON G.: Visualizing data using t-sne. Journal of Machine Learning Research 9, Nov (2008), 2579-2605. 2
  37. M/"UHLBACHER T., LINHARDT L., MÖLLER T., PIRINGER H.: Treepod: Sensitivity-aware selection of pareto-optimal decision trees. IEEE Transactions on Visualization and Computer Graphics PP, 99 (2017), 1-1. 2
  38. MSC * 13] MIKOLOV T., SUTSKEVER I., CHEN K., CORRADO G., DEAN J.: Distributed representations of words and phrases and their compositionality. In Proc. of NIPS (2013), pp. 3111-3119. 2, 9
  39. MUNZNER T.: Visualization analysis and design. CRC press, 2014. 3, 4
  40. NAYAK N., ANGELI G., MANNING C. D.: Evaluating word embeddings using a representative suite of practical tasks. In Proc. of the ACL Workshop on Evaluating Vector-Space Representations for NLP (2016), pp. 31-35.
  41. PKL * 17] PARK D., KIM S., LEE J., CHOO J., DIAKOPOULOS N., ELMQVIST N.: Conceptvector: Text visual analytics via interactive lexi- con building using word embedding. IEEE Transactions on Visualization and Computer Graphics PP, 99 (2017), 1-1. 3
  42. PENNINGTON J., SOCHER R., MANNING C. D.: Glove: Global vectors for word representation. In Proc. of EMNLP (2014), pp. 1532-1543. 1, 2, 3, 4
  43. RONG X., ADAR E.: Visual tools for debugging neural language models. In Proc. of ICML Workshop on Visualization for Deep Learning (2016). 2
  44. ROSSIELLO G., BASILE P., SEMERARO G.: Centroid-based text summarization through compositionality of word embeddings. In Proc. of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres (04 2017), pp. 12-21. 3, 4
  45. RUBENSTEIN H., GOODENOUGH J. B.: Contextual correlates of synonymy. Communications of the ACM 8, 10 (Oct. 1965), 627-633.
  46. SBM * 13] SOCHER R., BAUER J., MANNING C. D., ET AL.: Pars- ing with compositional vector grammars. In Proc. of the 51st Annual Meeting of the Association for Computational Linguistics (2013), vol. 1, pp. 455-465. 1
  47. STASKO J., GÖRG C., LIU Z.: Jigsaw: supporting investigative analysis through interactive visualization. Information visualisation 7, 2 (2008), 118-132. 2
  48. STROBELT H., GEHRMANN S., PFISTER H., RUSH A. M.: Lstmvis: A tool for visual analysis of hidden state dynamics in recur- rent neural networks. IEEE Transactions on Visualization and Computer Graphics (2017). 2
  49. SHNEIDERMAN B.: The eyes have it: A task by data type taxon- omy for information visualizations. In Proc. of the IEEE Symposium on Visual Languages (1996), IEEE, pp. 336-343. 5
  50. SCHNABEL T., LABUTOV I., MIMNO D., JOACHIMS T.: Evaluation methods for unsupervised word embeddings. In Proc. of EMNLP (2015), pp. 298-307. 3
  51. SEDLMAIR M., MEYER M., MUNZNER T.: Design study methodology: Reflections from the trenches and the stacks. IEEE trans- actions on visualization and computer graphics 18, 12 (2012), 2431- 2440. 3
  52. SUZUKI J., NAGATA M.: Right-truncatable neural word embed- dings. In HLT-NAACL (2016), pp. 1145-1151. 3
  53. SCHULZ H.-J., NOCKE T., HEITZLER M., SCHUMANN H.: A design space of visualization tasks. IEEE Transactions on Visualiza- tion and Computer Graphics 19, 12 (Dec. 2013), 2366-2375. 4, 5 [STN * 16] SMILKOV D., THORAT N., NICHOLSON C., REIF E., VIÉGAS F. B., WATTENBERG M.: Embedding projector: Interac- tive visualization and interpretation of embeddings. arXiv preprint arXiv:1611.05469 (2016). 2
  54. TALBOT J., LEE B., KAPOOR A., TAN D. S.: Ensemblema- trix: Interactive visualization to support machine learning with multiple classifiers. 2
  55. VAN DEN ELZEN S., VAN WIJK J. J.: Baobabview: Interac- tive construction and analysis of decision trees. In IEEE Conference on Visual Analytics Science and Technology (Oct 2011), pp. 151-160. 2
  56. VAN HAM F., WATTENBERG M., VIEGAS F. B.: Mapping text with phrase nets. IEEE Transactions on Visualization and Computer Graphics 15, 6 (Nov 2009), 1169-1176. 3
  57. WLL * 16] WANG X., LIU S., LIU J., CHEN J., ZHU J., GUO B.: Top- icpanorama: A full picture of relevant topics. IEEE Transactions on Vi- sualization and Computer Graphics 22, 12 (Dec 2016), 2508-2521. 2 [WSW * 17] WONGSUPHASAWAT K., SMILKOV D., WEXLER J., WIL- SON J., MANÃL' D., FRITZ D., KRISHNAN D., VIÃL'GAS F. B., WAT- TENBERG M.: Visualizing dataflow graphs of deep learning models in tensorflow. IEEE Transactions on Visualization and Computer Graphics PP, 99 (2017), 1-1. 2
  58. WATTENBERG M., VIÃL'GAS F. B.: The word tree, an interac- tive visual concordance. IEEE Transactions on Visualization and Com- puter Graphics 14, 6 (Nov 2008), 1221-1228. 3, 7
  59. YIN W., SCHÜTZE H.: Learning word meta-embeddings. In Proc. of ACL (2016), pp. 1351-1360. 3
  60. YWL * 16] YIN R., WANG Q., LI P., LI R., WANG B.: Multi- granularity chinese word embedding. In EMNLP (2016), pp. 981-986. 3, 4
  61. YU L.-C., WANG J., LAI K. R., ZHANG X.: Refining word embeddings for sentiment analysis. In EMNLP (2017), pp. 545-550. 1
  62. ZOU W. Y., SOCHER R., CER D., MANNING C. D.: Bilin- gual word embeddings for phrase-based machine translation. In EMNLP (2013), pp. 1393-1398. 2