Automatic Photo to Ideophone Manga Matching
2020, ArXiv
Abstract
Photo applications offer tools for annotation via text and stickers. Ideophones, mimetic and onomatopoeic words, which are common in graphic novels, have yet to be explored for photo annotation use. We present a method for automatic ideophone recommendation and positioning of the text on photos. These annotations are accomplished by obtaining a list of ideophones with English definitions and applying a suite of visual object detectors to the image. Next, a semantic embedding maps the visual objects to the possible relevant ideophones. Our system stands in contrast to traditional computer vision-based annotation systems, which stop at recommending object and scene-level annotation, by providing annotations that are communicative, fun, and engaging. We test these annotations in Japanese and find they carry a strong preference and increase enjoyment and sharing likelihood when compared to unannotated and object-based annotated photos.
References (21)
- AIY, G. Models: Machine learning models for aiy kits. Website, 10 2019. https://aiyprojects.withgoogle.com/ models/.
- AMES, M., AND NAAMAN, M. Why we tag: Motivations for annotation in mobile and online media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (New York, NY, USA, 2007), CHI '07, ACM, pp. 971-980.
- BAKHSHI, S., SHAMMA, D. A., KENNEDY, L., AND GILBERT, E. Why we filter our photos and how it impacts engagement. In Ninth International AAAI Conference on Web and Social Media (Oxford, UK, 2015), AAAI, pp. 12-21.
- BOULANGER, C., BAKHSHI, S., KAYE, J. J., AND SHAMMA, D. A. The design, perception, and practice of tablet photography. In Proceedings of the 2016 ACM Conference on Designing Interactive Systems (New York, NY, USA, 2016), DIS '16, ACM, pp. 84-95.
- FUJIEDA, S., MORIMOTO, Y., AND OHZEKI, K. An image generation system of delicious food in a manga style. SIGGRAPH Asia 2017 Posters on -SA '17 (2017).
- GEVERS, T., AND SMEULDERS, A. W. Color-based object recognition. Pattern recognition 32, 3 (1999), 453-464.
- HOWARD, A. G., ZHU, M., CHEN, B., KALENICHENKO, D., WANG, W., WEYAND, T., ANDREETTO, M., AND ADAM, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications, 2017.
- IANDOLA, F. N., HAN, S., MOSKEWICZ, M. W., ASHRAF, K., DALLY, W. J., AND KEUTZER, K. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5mb model size, 2016.
- KRIZHEVSKY, A., SUTSKEVER, I., AND HINTON, G. E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (2012), pp. 1097-1105.
- LOWE, D. G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision (1999), vol. 2, pp. 1150-1157 vol.2.
- MAO, Z., ZHANG, Y., GAO, K., AND ZHANG, D. A method for detecting salient regions using integrated features. In Proceedings of the 20th ACM International Conference on Multimedia (New York, NY, USA, 2012), MM '12, ACM, pp. 745-748.
- MATSUMURA, K., AND SUMI, Y. Cameramatch. CHI '13 Extended Abstracts on Human Factors in Computing Systems on - CHI EA '13 (2013).
- PENNINGTON, J., SOCHER, R., AND MANNING, C. D. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 1532-1543.
- TANG, Y., AND WU, X. Salient object detection with chained multi-scale fully convolutional network. In Proceedings of the 25th ACM International Conference on Multimedia (New York, NY, USA, 2017), MM '17, ACM, pp. 618-626.
- THEJADEDNETWORK. Sfx translations. Website, 10 2019. http://thejadednetwork.com/sfx/.
- THOMEE, B., SHAMMA, D. A., FRIEDLAND, G., ELIZALDE, B., NI, K., POLAND, D., BORTH, D., AND LI, L.-J. Yfcc100m: The new data in multimedia research. Commun. ACM 59, 2 (Jan. 2016), 64-73.
- UMEDA, D., MORIYA, T., AND TAKAHASHI, T. Real-timemanga-like depiction based on interpretation of bodily movements by usingkinect. SIGGRAPH Asia 2012 Technical Briefs on -SA '12 (2012).
- WONG, T.-T., IGARASHI, T., XU, Y.-Q., AND SHI, D. Computational manga and anime. SIGGRAPH Asia 2013 Courses on -SA '13 (2013).
- XIAO, J., HAYS, J., EHINGER, K. A., OLIVA, A., AND TORRALBA, A. Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010), IEEE, pp. 3485-3492.
- YE, T., ZHANG, D., JIN, G., GAO, K., GU, X., AND ZHANG, Y. Monte carlo sampling based salient region detection. In Proceedings of International Conference on Multimedia Retrieval (New York, NY, USA, 2014), ICMR '14, ACM, pp. 97:97- 97:104.
- ZHOU, B., LAPEDRIZA, A., XIAO, J., TORRALBA, A., AND OLIVA, A. Learning deep features for scene recognition using places database. In Advances in neural information processing systems (2014), pp. 487-495.