Academia.eduAcademia.edu

Outline

Plot and Rework: Modeling Storylines for Visual Storytelling

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

https://doi.org/10.18653/V1/2021.FINDINGS-ACL.390

Abstract

Writing a coherent and engaging story is not easy. Creative writers use their knowledge and worldview to put disjointed elements together to form a coherent storyline, and work and rework iteratively toward perfection. Automated visual storytelling (VIST) models, however, make poor use of external knowledge and iterative generation when attempting to create stories. This paper introduces PR-VIST, a framework that represents the input image sequence as a story graph in which it finds the best path to form a storyline. PR-VIST then takes this path and learns to generate the final story via a re-evaluating training process. This framework produces stories that are superior in terms of diversity, coherence, and humanness, per both automatic and human evaluations. An ablation study shows that both plotting and reworking contribute to the model's superiority.

References (40)

  1. Collin F Baker, Charles J Fillmore, and John B Lowe. 1998. The Berkeley FrameNet Project. In Proceed- ings of the 17th International Conference on Compu- tational Linguistics-Volume 1, pages 86-90. Associ- ation for Computational Linguistics.
  2. Zhiqian Chen, Xuchao Zhang, Arnold P Boedihardjo, Jing Dai, and Chang-Tien Lu. 2017. Multimodal sto- rytelling via generative adversarial imitation learn- ing. arXiv preprint arXiv:1712.01455.
  3. Zi-Yuan Chen, Chih-Hung Chang, Yi-Pei Chen, Jij- nasa Nayak, and Lun-Wei Ku. 2019. UHop: An unrestricted-hop relation extraction framework for knowledge-based question answering.
  4. Kyunghyun Cho, Bart Van Merriënboer, Caglar Gul- cehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
  5. Bo Dai, Sanja Fidler, Raquel Urtasun, and Dahua Lin. 2017. Towards diverse and natural image descrip- tions via a conditional GAN. In Proceedings of the IEEE International Conference on Computer Vision, pages 2970-2979.
  6. Angela Fan, M. Lewis, and Yann Dauphin. 2019. Strategies for structuring story generation. In ACL.
  7. Linda Flower and John R Hayes. 1981. A cognitive process theory of writing. College Composition and Communication, 32(4):365-387.
  8. Diana Gonzalez-Rico and Gibran Fuentes-Pineda. 2018. Contextualize, show and tell: A neural visual storyteller. arXiv preprint arXiv:1806.00738.
  9. Chao-Chun Hsu, Szu-Min Chen, Ming-Hsun Hsieh, and Lun-Wei Ku. 2018. Using inter-sentence di- verse beam search to reduce redundancy in visual storytelling. arXiv preprint arXiv:1805.11867.
  10. Chao-Chun Hsu, Zi-Yuan Chen, Chi-Yang Hsu, Chih- Chia Li, Tzu-Yuan Lin, Ting-Hao (Kenneth) Huang, and Lun-Wei Ku. 2020. Knowledge-enriched visual storytelling. In Proceedings of Thirty-Fourth AAAI Conference on Artificial Intelligence.
  11. Ting-Yao Hsu, Huang Chieh-Yang, Yen-Chia Hsu, and Ting-Hao Kenneth Huang. 2019. Visual story post- editing. In ACL.
  12. Junjie Hu, Yu Cheng, Zhe Gan, Jingjing Liu, Jianfeng Gao, and Graham Neubig. 2019. What makes a good story? designing composite rewards for visual storytelling.
  13. Qiuyuan Huang, Zhe Gan, Asli Celikyilmaz, Dapeng Wu, Jianfeng Wang, and Xiaodong He. 2019. Hier- archically structured reinforcement learning for top- ically coherent visual story generation. In Proceed- ings of the AAAI Conference on Artificial Intelli- gence, volume 33, pages 8465-8472.
  14. Ting-Hao Kenneth Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Ja- cob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, et al. 2016. Visual storytelling. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, pages 1233-1239.
  15. J. Johnson, Agrim Gupta, and Li Fei-Fei. 2018. Im- age generation from scene graphs. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition, pages 1219-1228.
  16. Yunjae Jung, Dahun Kim, Sanghyun Woo, Kyungsu Kim, Sungjin Kim, and In So Kweon. 2020. Hide- and-tell: Learning to bridge photo streams for visual storytelling. ArXiv, abs/2002.00774.
  17. Taehyeong Kim, Min-Oh Heo, Seonil Son, Kyoung- Wha Park, and Byoung-Tak Zhang. 2018. GLAC Net: GLocal Attention Cascading Networks for multi-image cued story generation. arXiv preprint arXiv:1805.10973.
  18. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. CoRR, abs/1412.6980.
  19. Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin John- son, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, and et al. 2017. Visual Genome: Connecting language and vision using crowdsourced dense image anno- tations. International Journal of Computer Vision, 123(1):32-73.
  20. Jiacheng Li, Siliang Tang, Juncheng Li, Jun Xiao, Fei Wu, Shiliang Pu, and Yueting Zhuang. 2020. Topic adaptation and prototype encoding for few-shot vi- sual storytelling. Proceedings of the 28th ACM In- ternational Conference on Multimedia.
  21. Nanxing Li, Bei Liu, Zhizhong Han, Yu-Shen Liu, and Jianlong Fu. 2019. Emotion reinforced visual sto- rytelling. In Proceedings of the 2019 on Interna- tional Conference on Multimedia Retrieval, ICMR '19, page 297-305, New York, NY, USA. Associa- tion for Computing Machinery.
  22. Hugo Liu and Push Singh. 2004. ConceptNet-a prac- tical commonsense reasoning tool-kit. BT Technol- ogy Journal, 22(4):211-226.
  23. Lixin Liu, Jiajun Tang, Xiaojun Wan, and Zongming Guo. 2019. Generating diverse and descriptive im- age captions using visual paraphrases. In Proceed- ings of the IEEE/CVF International Conference on Computer Vision (ICCV).
  24. McCarthy and S. Jarvis. 2010. MTLD, vocd-D, and HD-D: A validation study of sophisticated ap- proaches to lexical diversity assessment. Behavior Research Methods, page 381-392.
  25. Margaret Mitchell, Ting-Hao Huang, Francis Ferraro, and Ishan Misra. 2018. Proceedings of the first workshop on storytelling. In Proceedings of the First Workshop on Storytelling.
  26. Yatri Modi and Natalie Parde. 2019. The steep road to happily ever after: An analysis of current visual storytelling models. In Proceedings of the Second Workshop on Shortcomings in Vision and Language, pages 47-57, Minneapolis, Minnesota. Association for Computational Linguistics.
  27. N. Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, P. Kohli, and James F. Allen. 2016. A corpus and cloze evaluation for deeper understanding of com- monsense stories. In NAACL.
  28. Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word rep- resentation. In Empirical Methods in Natural Lan- guage Processing (EMNLP), pages 1532-1543.
  29. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time ob- ject detection with region proposal networks. In Ad- vances in Neural Information Processing Systems, pages 91-99.
  30. Thibault Sellam, Dipanjan Das, and Ankur P. Parikh. 2020. BLEURT: Learning robust metrics for text generation.
  31. Swabha Swayamdipta, Sam Thomson, Chris Dyer, and Noah A. Smith. 2017. Frame-Semantic Parsing with Softmax-Margin Segmental RNNs and a Syntactic Scaffold. arXiv preprint arXiv:1706.09528.
  32. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Pro- cessing Systems, pages 5998-6008.
  33. Jing Wang, Jianlong Fu, Jinhui Tang, Zechao Li, and Tao Mei. 2018a. Show, reward and tell: Automatic generation of narrative paragraph from photo stream by adversarial training. In Thirty-Second AAAI Con- ference on Artificial Intelligence.
  34. Ruize Wang, Zhongyu Wei, Piji Li, Haijun Shan, Ji Zhang, Qi Zhang, and Xuanjing Huang. 2019. Keep it consistent: Topic-aware storytelling from an image stream via iterative multi-agent communica- tion.
  35. Ruize Wang, Zhongyu Wei, Piji Li, Qi Zhang, and Xu- anjing Huang. 2020. Storytelling from an image stream using scene graphs. In AAAI 2020.
  36. Xin Wang, Wenhu Chen, Yuan-Fang Wang, and William Yang Wang. 2018b. No metrics are perfect: Adversarial reward learning for visual storytelling. CoRR, abs/1804.09160.
  37. Ronald J. Williams. 1992. Simple statistical gradient- following algorithms for connectionist reinforce- ment learning. Mach. Learn., 8(3-4):229-256.
  38. Pengcheng Yang, Fuli Luo, Peng Chen, Lei Li, Zhiyi Yin, Xiaodong He, and Xu Sun. 2019. Knowledge- able Storyteller: A commonsense-driven generative model for visual storytelling. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 5356-5362. International Joint Conferences on Artificial Intelli- gence Organization.
  39. Lili Yao, Nanyun Peng, Ralph M. Weischedel, Kevin Knight, Dongyan Zhao, and Rui Yan. 2018. Plan- and-write: Towards better automatic storytelling. CoRR, abs/1811.05701.
  40. Yang Yu, Kazi Saidul Hasan, Mo Yu, Wei Zhang, and Zhiguo Wang. 2018. Knowledge base relation de- tection via multi-view matching. New Trends in Databases and Information Systems, page 286-294.