Concept-Oriented Transformers for Visual Sentiment Analysis

Quoc-Tuan Truong; Hady W. Lauw

doi:10.1145/3539597.3570437

Outline

Concept-Oriented Transformers for Visual Sentiment Analysis

Hady W. Lauw

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

https://doi.org/10.1145/3539597.3570437

visibility

…

description

9 pages

link

1 file

Abstract

In the richly multimedia Web, detecting sentiment signals expressed in images would support multiple applications, e.g., measuring customer satisfaction from online reviews, analyzing trends and opinions from social media. Given an image, visual sentiment analysis aims at recognizing positive or negative sentiment, and occasionally neutral sentiment as well. A nascent yet promising direction is Transformer-based models applied to image data, whereby Vision Transformer (ViT) establishes remarkable performance on largescale vision benchmarks. In addition to investigating the fitness of ViT for visual sentiment analysis, we further incorporate concept orientation into the self-attention mechanism, which is the core component of Transformer. The proposed model captures the relationships between image features and specific concepts. We conduct extensive experiments on Visual Sentiment Ontology (VSO) and Yelp.com online review datasets, showing that not only does the proposed model significantly improve upon the base model ViT in detecting visual sentiment but it also outperforms previous visual sentiment analysis models with narrowly-defined orientations. Additional analyses yield insightful results and better understanding of the concept-oriented self-attention mechanism. CCS CONCEPTS • Information systems → Web mining; Multimedia information systems; • Computing methodologies → Computer vision.

References (56)

SentiViT-A 76.8 ±0.6 84.1 ±0.7 79.5 ±0.9 82.2 ±0.8 79.9 ±0.5 80.4 ±0.4 (b) Yelp-User Method BO CH LA NY SF Avg. Accuracy ResNet-152 54.1 ±0.1 55.0 ±0.1 53.0 ±0.3 53.5 ±0.2 55.2 ±0.4 53.5 ±0.2 EffiNet-B7 56.3 ±0.4 56.7 ±0.7 55.1 ±0.9 55.3 ±0.5 57.8 ±0.7 55.5 ±0.4 ViT 57.2 ±0.6 57.9 ±0.6 56.5 ±0.2 57.3 ±0.1 56.5 ±0.5 56.8 ±0.1
SentiViT-A 76.0 ±0.7 76.3 ±0.6 75.8 ±0.2 75.0 ±0.2 78.2 ±0.8 75.8 ±0.2 (c) Yelp-Business Method BO CH LA NY SF Avg. Accuracy ResNet-152 56.1 ±0.2 55.5 ±0.3 57.2 ±0.3 55.6 ±0.2 55.9 ±0.5 56.6 ±0.1
SentiViT-A 70.1 ±0.9 73.4 ±0.8 72.3 ±0.3 70.7 ±0.7 70.7 ±0.8 71.6 ±0.3 (d) Yelp-Category Method BO CH LA NY SF Avg. Accuracy ResNet-152 59.8 ±0.2 61.2 ±0.2 59.4 ±0.2 59.0 ±0.1 57.6 ±0.2 59.2 ±0.1
EffiNet-B7 61.4 ±0.5 61.2 ±0.4 61.9 ±0.2 61.3 ±0.5 61.4 ±0.6 61.7 ±0.3 ViT 66.0 ±0.3 64.6 ±0.2 63.4 ±0.1 62.8 ±0.2 60.2 ±0.1 63.1 ±0.1
Samira Abnar and Willem H. Zuidema. 2020. Quantifying Attention Flow in Transformers. In ACL. Association for Computational Linguistics, 4190-4197.
Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. In LREC. European Language Resources Association.
Marian Stewart Bartlett, Gwen Littlewort, Mark G. Frank, Claudia Lainscsek, Ian R. Fasel, and Javier R. Movellan. 2005. Recognizing Facial Expression: Machine Learning and Application to Spontaneous Behavior. In CVPR. 568-573.
Adam Bermingham and Alan F Smeaton. 2010. Classifying sentiment in mi- croblogs: is brevity an advantage?. In Proceedings of the 19th ACM international conference on Information and knowledge management. 1833-1836.
Johan Bollen, Huina Mao, and Alberto Pepe. 2011. Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena. In ICWSM.
Damian Borth, Tao Chen, Rongrong Ji, and Shih-Fu Chang. 2013. SentiBank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In Multimedia. ACM, 459-460.
Damian Borth, Rongrong Ji, Tao Chen, Thomas M. Breuel, and Shih-Fu Chang. 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Multimedia. ACM, 223-232.
Victor Campos, Brendan Jou, and Xavier Giró-i-Nieto. 2017. From pixels to sentiment: Fine-tuning CNNs for visual sentiment prediction. Image Vis. Comput. 65 (2017), 15-22.
Tao Chen, Damian Borth, Trevor Darrell, and Shih-Fu Chang. 2014. DeepSen- tiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks. CoRR abs/1410.8586 (2014). arXiv:1410.8586
Yan-Ying Chen, Tao Chen, Winston H. Hsu, Hong-Yuan Mark Liao, and Shih-Fu Chang. 2014. Predicting Viewer Affective Comments Based on Image Content in Social Media. In ICMR. ACM, 233.
Carlo Colombo, Alberto Del Bimbo, and Pietro Pala. 1999. Semantics in Visual Information Retrieval. IEEE Multim. 6, 3 (1999), 38-53.
Ritendra Datta, Dhiraj Joshi, Jia Li, and James Ze Wang. 2006. Studying Aesthetics in Photographic Images Using a Computational Approach. In ECCV (Lecture Notes in Computer Science, Vol. 3953). Springer, 288-301.
Dmitry Davidov, Oren Tsur, and Ari Rappoport. 2010. Enhanced Sentiment Learn- ing Using Twitter Hashtags and Smileys. In COLING, Posters Volume. Chinese Information Processing Society of China, 241-249.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. Association for Computational Linguistics, 4171-4186.
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.
Andrea Esuli and Fabrizio Sebastiani. 2006. SENTIWORDNET: A Publicly Avail- able Lexical Resource for Opinion Mining. In LREC. European Language Re- sources Association (ELRA), 417-422.
Francesco Gelli, Tiberio Uricchio, Marco Bertini, Alberto Del Bimbo, and Shih-Fu Chang. 2015. Image Popularity Prediction in Social Media Using Sentiment and Context Features. In Multimedia. ACM, 907-910.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. IEEE Computer Society, 770-778.
Xia Hu, Jiliang Tang, Huiji Gao, and Huan Liu. 2013. Unsupervised sentiment analysis with emotional signals. In Proceedings of the 22nd international conference on World Wide Web. 607-618.
Jyoti Islam and Yanqing Zhang. 2016. Visual Sentiment Analysis for Social Images Using Transfer Learning Approach. In (BDCloud), (SocialCom), (SustainCom). IEEE Computer Society, 124-130.
Phillip Isola, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2011. What makes an image memorable?. In CVPR. IEEE Computer Society, 145-152.
Jia Jia, Sen Wu, Xiaohui Wang, Peiyun Hu, Lianhong Cai, and Jie Tang. 2012. Can we understand van gogh's mood?: learning to infer affects from images in social networks. In Multimedia. ACM, 857-860.
Dhiraj Joshi, Ritendra Datta, Elena A. Fedorovskaya, Quang-Tuan Luong, James Ze Wang, Jia Li, and Jiebo Luo. 2011. Aesthetics and Emotions in Im- ages. IEEE Signal Process. Mag. 28, 5 (2011), 94-115.
Brendan Jou, Tao Chen, Nikolaos Pappas, Miriam Redi, Mercan Topkara, and Shih-Fu Chang. 2015. Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology. In Multimedia. ACM, 159-168.
Aditya Khosla, Atish Das Sarma, and Raffay Hamid. 2014. What makes an image popular?. In Proceedings of the 23rd international conference on World wide web. 867-876.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Clas- sification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems, Peter L. Bartlett, Fernando C. N. Pereira, Christo- pher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger (Eds.). 1106-1114.
Zuhe Li, Yangyu Fan, Weihua Liu, and Fengqin Wang. 2018. Image sentiment prediction based on textual descriptions with adjective noun pairs. Multim. Tools Appl. 77, 1 (2018), 1115-1132.
Ilya Loshchilov and Frank Hutter. 2017. SGDR: Stochastic Gradient Descent with Warm Restarts. In International Conference on Learning Representations.
Jana Machajdik and Allan Hanbury. 2010. Affective image classification using features inspired by psychology and art theory. In Multimedia. ACM, 83-92.
Luca Marchesotti, Florent Perronnin, Diane Larlus, and Gabriela Csurka. 2011. Assessing the aesthetic quality of photographs using generic image descriptors. In ICCV. IEEE Computer Society, 1784-1791.
Philip J McParlane, Yashar Moshfeghi, and Joemon M Jose. 2014. "Nobody comes here anymore, it's too crowded"; Predicting Image Popularity on Flickr. In Proceedings of international conference on multimedia retrieval. 385-391.
Bo Pang and Lillian Lee. 2007. Opinion Mining and Sentiment Analysis. Found. Trends Inf. Retr. 2, 1-2 (2007), 1-135.
Tianrong Rao, Xiaoxu Li, and Min Xu. 2020. Learning Multi-level Deep Repre- sentations for Image Emotion Classification. Neural Process. Lett. 51, 3 (2020), 2043-2061.
Fabrizio Ravì and Sebastiano Battiato. 2012. A Novel Computational Tool for Aesthetic Scoring of Digital Photography. In CGIV. IS&T -The Society for Imaging Science and Technology, 349-354.
Stefanie Schmidt and Wolfgang G. Stock. 2009. Collective indexing of emotions in images. A study in emotional information retrieval. J. Assoc. Inf. Sci. Technol. 60, 5 (2009), 863-876.
Stefan Siersdorfer, Enrico Minack, Fan Deng, and Jonathon S. Hare. 2010. Ana- lyzing and predicting sentiment of images on the social web. In MM. ACM.
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Net- works for Large-Scale Image Recognition. In International Conference on Learning Representations.
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1-9.
Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In International conference on machine learning, Vol. 97. PMLR, 6105-6114.
Luam Catao Totti, Felipe Almeida Costa, Sandra Eliza Fontes de Avila, Eduardo Valle, Wagner Meira Jr., and Virgílio A. F. Almeida. 2014. The impact of visual attributes on online image diffusion. In Proceedings of the 2014 ACM conference on Web science. ACM, 42-51.
Quoc-Tuan Truong and Hady W. Lauw. 2017. Visual Sentiment Analysis for Review Images with Item-Oriented and User-Oriented CNN. In Multimedia. ACM, 1274-1282.
Quoc-Tuan Truong and Hady W. Lauw. 2019. VistaNet: Visual Aspect Attention Network for Multimodal Sentiment Analysis. In AAAI. AAAI Press, 305-312.
Andranik Tumasjan, Timm Oliver Sprenger, Philipp G. Sandner, and Isabell M. Welpe. 2010. Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment. In ICWSM.
Lucia Vadicamo, Fabio Carrara, Andrea Cimino, Stefano Cresci, Felice Dell'Orletta, Fabrizio Falchi, and Maurizio Tesconi. 2017. Cross-Media Learning for Image Sentiment Analysis in the Wild. In ICCV. IEEE Computer Society, 308-317.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems. 5998-6008.
Jesse Vig. 2019. A Multiscale Visualization of Attention in the Transformer Model. In ACL. Association for Computational Linguistics, 37-42.
Wei-Ning Wang, Ying-Lin Yu, and Shengming Jiang. 2006. Image Retrieval by Emotional Semantics: A Study of Emotional Space and Feature Extraction. In SMC. IEEE, 3534-3539.
Yilin Wang, Yuheng Hu, Subbarao Kambhampati, and Baoxin Li. 2015. Inferring Sentiment from Web Images with Joint Inference on Visual and Social Cues: A Regulated Matrix Factorization Approach. In ICWSM. AAAI Press, 473-482.
Yang Yang, Jia Jia, Shumei Zhang, Boya Wu, Qicong Chen, Juanzi Li, Chunxiao Xing, and Jie Tang. 2014. How Do Your Friends on Social Media Disclose Your Emotions?. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence. AAAI Press, 306-312.
Quanzeng You, Hailin Jin, and Jiebo Luo. 2017. Visual Sentiment Analysis by Attending on Local Image Regions. In Proceedings of the Thirty-First AAAI Con- ference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA. AAAI Press, 231-237.
Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2016. Cross-modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia. In WSDM. ACM, 13-22.
Jianbo Yuan, Sean Mcdonough, Quanzeng You, and Jiebo Luo. 2013. Sentribute: image sentiment analysis from a mid-level perspective. In Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining. ACM, 10:1-10:8.

Concept-Oriented Transformers for Visual Sentiment Analysis

Sign up for access to the world's latest research

Abstract

Related papers

References (56)

Related papers

Related topics