Academia.eduAcademia.edu

Outline

Semantic Visualization for Short Texts with Word Embeddings

2017, Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence

https://doi.org/10.24963/IJCAI.2017/288

Abstract

Semantic visualization integrates topic modeling and visualization, such that every document is associated with a topic distribution as well as visualization coordinates on a low-dimensional Euclidean space. We address the problem of semantic visualization for short texts. Such documents are increasingly common, including tweets, search snippets, news headlines, or status updates. Due to their short lengths, it is difficult to model semantics as the word co-occurrences in such a corpus are very sparse. Our approach is to incorporate auxiliary information, such as word embeddings from a larger corpus, to supplement the lack of co-occurrences. This requires the development of a novel semantic visualization model that seamlessly integrates visualization coordinates, topic distributions, and word vectors. We propose a model called GaussianSV, which outperforms pipelined baselines that derive topic models and visualization coordinates as disjoint steps, as well as semantic visualization baselines that do not consider word embeddings.

References (17)

  1. Arora et al., 2012] Sanjeev Arora, Rong Ge, and Ankur Moitra. Learning topic models-going beyond svd. In FOCS, pages 1-10. IEEE, 2012. [Blei et al., 2003] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3, 2003. [Brants and Franz, 2006] Thorsten Brants and Alex Franz. Web 1T 5-gram Version 1. Linguistic Data Consortium, Philadelphia, 2006.
  2. Das et al., 2015] Rajarshi Das, Manzil Zaheer, and Chris Dyer. Gaussian LDA for topic models with word embed- dings. In ACL, 2015.
  3. Deerwester et al., 1990] Scott Deerwester, Susan T Du- mais, George W Furnas, Thomas K Landauer, and Richard Harshman. Indexing by latent semantic analysis. JASIS, 41(6):391, 1990.
  4. Dempster et al., 1977] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical So- ciety, Series B, 39(1):1-38, 1977. [der Maaten and Hinton, 2008] L. Van der Maaten and G. Hinton. Visualizing data using t-SNE. JMLR, 9, 2008. [Greene and Cunningham, 2006] Derek Greene and Pádraig Cunningham. Practical solutions to the problem of diag- onal dominance in kernel document clustering. In ICML, pages 377-384, 2006.
  5. Hofmann, 1999] T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, 1999.
  6. Hu and Tsujii, 2016] Weihua Hu and Junichi Tsujii. A la- tent concept topic model for robust topic inference using word embeddings. In ACL, page 380, 2016. [Iwata et al., 2007] T. Iwata, K. Saito, N. Ueda, S. Strom- sten, T. L. Griffiths, and J. B. Tenenbaum. Parametric embedding for class visualization. Neural Computation, 19(9), 2007.
  7. Iwata et al., 2008] Tomoharu Iwata, Takeshi Yamada, and Naonori Ueda. Probabilistic latent semantic visualization: topic model for visualizing documents. In KDD, pages 363-371, 2008.
  8. Jin et al., 2011] Ou Jin, Nathan N Liu, Kai Zhao, Yong Yu, and Qiang Yang. Transferring topical knowledge from auxiliary long texts for short text clustering. In CIKM, pages 775-784, 2011. [Kiros et al., 2015] Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. Skip-thought vectors. In NIPS, pages 3294-3302, 2015.
  9. Kruskal, 1964] J. B. Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psy- chometrika, 29(1), 1964.
  10. Le and Lauw, 2014a] Tuan M V Le and Hady W Lauw. Manifold learning for jointly modeling topic and visual- ization. In AAAI, 2014. [Le and Lauw, 2014b] Tuan M V Le and Hady W Lauw. Se- mantic visualization for spherical representation. In KDD, pages 1007-1016, 2014.
  11. Le and Mikolov, 2014] Quoc V Le and Tomas Mikolov. Distributed representations of sentences and documents. In ICML, volume 14, pages 1188-1196, 2014. [Li et al., 2016] Chenliang Li, Haoran Wang, Zhiqian Zhang, Aixin Sun, and Zongyang Ma. Topic modeling for short texts with auxiliary word embeddings. In SIGIR, pages 165-174, 2016.
  12. Liu and Nocedal, 1989] Dong C. Liu and Jorge Nocedal. On the limited memory BFGS method for large scale opti- mization. Mathematical Programming, 45:503-528, 1989. [Metzler et al., 2007] Donald Metzler, Susan Dumais, and Christopher Meek. Similarity measures for short segments of text. In ECIR, pages 16-27, 2007. [Mikolov et al., 2013] Tomas Mikolov, Kai Chen, Greg Cor- rado, and Jeffrey Dean. Efficient estimation of word rep- resentations in vector space. In Proceedings of Workshop at ICLR, 2013.
  13. Newman et al., 2010] David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin. Automatic evaluation of topic coherence. In NAACL HLT, pages 100-108, 2010. [Nguyen et al., 2015] Dat Quoc Nguyen, Richard Billings- ley, Lan Du, and Mark Johnson. Improving topic models with latent feature word representations. TACL, 3:299- 313, 2015.
  14. Nigam et al., 2000] Kamal Nigam, Andrew Kachites Mc- Callum, Sebastian Thrun, and Tom Mitchell. Text classi- fication from labeled and unlabeled documents using em. Machine Learning, 39(2-3):103-134, 2000. [Pennington et al., 2014] Jeffrey Pennington, Richard
  15. Socher, and Christopher D Manning. GloVe: Global vectors for word representation. EMNLP, 12, 2014. [Phan et al., 2008] Xuan-Hieu Phan, Le-Minh Nguyen, and Susumu Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data col- lections. In WWW, pages 91-100, 2008. [Roweis and Saul, 2000] S. T. Roweis and L. K. Saul. Non- linear dimensionality reduction by locally linear embed- ding. Science, 290, 2000.
  16. Sriram et al., 2010] Bharath Sriram, Dave Fuhry, Engin Demir, Hakan Ferhatosmanoglu, and Murat Demirbas. Short text classification in twitter to improve information filtering. In SIGIR, pages 841-842, 2010.
  17. Sun, 2012] Aixin Sun. Short text classification using very few words. In SIGIR, pages 1145-1146. ACM, 2012. [Tenenbaum et al., 2000] J. B. Tenenbaum, V. De Silva, and J. C. Langford. A global geometric framework for nonlin- ear dimensionality reduction. Science, 290, 2000. [Yan et al., 2012] Xiaohui Yan, Jiafeng Guo, Shenghua Liu, Xue-qi Cheng, and Yanfeng Wang. Clustering short text using ncut-weighted non-negative matrix factorization. In CIKM, pages 2259-2262, 2012.