Crowdsourcing authoring of sensory effects on videos
2019, Multimedia Tools and Applications
https://doi.org/10.1007/S11042-019-7312-2Abstract
Human perception is inherently multi-sensorial involving five traditional senses: sight, hearing, touch, taste, and smell. In contrast to traditional multimedia, based on audio and visual stimuli, mulsemedia seek to stimulate all the human senses. One way to produce multisensorial content is authoring videos with sensory effects. These effects are represented as metadata attached to the video content, which are processed and rendered through physical devices into the user's environment. However, creating sensory effects metadata is not a trivial activity because authors have to identify carefully different details in a scene such as the exact point where each effect starts, finishes, and also its presentation features such as intensity, direction, etc. It is a subjective task that requires accurate human perception and time. In this article, we aim at finding out whether a crowdsourcing approach is suitable for authoring coherent sensory effects associated with video content. Our belief is that the combination of a collective common sense to indicate time intervals of sensory effects with an expert fine-tuning is a viable way to generate sensory effects from the point of view of users. To carry out the experiment, we selected three videos from a public mulsemedia dataset, sent them to the crowd through a cascading microtask approach. The results showed that the crowd can indicate intervals in which users agree that there should be insertions of sensory effects, revealing a way of sharing authoring between the author and the crowd.
References (46)
- Ademoye OA, Murray N, Muntean GM, Ghinea G (2016) Audio masking effect on inter-component skews in olfaction-enhanced multimedia presentations. ACM Trans Multimedia Comput Commun Appl 12(4):51:1-51:14. https://doi.org/10.1145/2957753
- Amorim MN, Neto FRA, Santos CAS (2018) Achieving complex media annotation through collective wisdom and effort from the crowd. In: 2018 25th international conference on systems, signals and image processing (IWSSIP). IEEE, pp 1-5. https://doi.org/10.1109/IWSSIP.2018.8439402
- Ballan L, Bertini M, Del Bimbo A, Seidenari L, Serra G (2011) Event detection and recognition for semantic annotation of video. Multimedia Tools Appl 51(1):279-302. https://doi.org/10.1007/s 11042-010-0643-7
- Bartocci S, Betti S, Marcone G, Tabacchiera M, Zanuccoli F, Chiari A (2015) A novel multimedia- multisensorial 4d platform. In: AEIT International annual conference (AEIT), 2015. IEEE, pp 1-6. https://doi.org/10.1109/AEIT.2015.7415215
- Chen J, Yao T, Chao H (2018) See and chat: automatically generating viewer-level comments on images. MTAP: Multimedia Tools Appl, 1-14. https://doi.org/10.1007/s11042-018-5746-6
- Cho H (2010) Event-based control of 4d effects using mpeg rose. Master's thesis, School of Mechanical, Aerospace and Systems Engineering, Division of Mechanical Engineering. Korea Advanced Institute of Science and Technology. Master's Thesis
- Choi B, Lee ES, Yoon K (2011) Streaming media with sensory effect. In: 2011 international con- ference on information science and applications (ICISA). IEEE, pp 1-6. https://doi.org/10.1109/IC ISA.2011.5772390
- Chowdhury SN, Tandon N, Weikum G (2016) Know2look: commonsense knowledge for visual search. In: Proceedings of the 5th workshop on automated knowledge base construction, pp 57-62
- Covaci A, Zou L, Tal I, Muntean GM, Ghinea G (2018) Is multimedia multisensorial?-a review of mulsemedia systems. ACM Comput Survey (CSUR) 51(5):91
- Cross A, Bayyapunedi M, Ravindran D, Cutrell E, Thies W (2014) Vidwiki: enabling the crowd to improve the legibility of online educational videos. In: Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, pp 1167-1175
- Di Salvo R, Spampinato C, Giordano D (2016) Generating reliable video annotations by exploit- ing the crowd. In: IEEE Winter conf. on applications of computer vision (WACV). IEEE, pp 1-8. https://doi.org/10.1109/WACV.2016.7477718
- Dumitrache A, Aroyo L, Welty C, Sips RJ, Levas A (2013) A.: "dr. detective": combining gamification techniques and crowdsourcing to create a gold standard in medical text. 16-31
- Egan D, Brennan S, Barrett J, Qiao Y, Timmerer C, Murray N (2016) An evaluation of heart rate and electrodermal activity as an objective qoe evaluation method for immersive virtual real- ity environments. In: 8th international conference on quality of multimedia experience (qoMEX'16). https://doi.org/10.1109/QoMEX.2016.7498964
- Foncubierta Rodríguez A, Müller H (2012) Ground truth generation in medical imaging: a crowdsourcing-based iterative approach. In: Proceedings of the ACM multimedia 2012 workshop on crowdsourcing for multimedia, CrowdMM '12. ACM, New York, pp 9-14. https://doi.org/10.1145 /2390803.2390808
- Galton F (1907) Vox populi (the wisdom of crowds). Nature 75(7):450-451
- Ghinea G, Timmerer C, Lin W, Gulliver SR (2014) Mulsemedia: State of the art, per- spectives, and challenges. ACM Trans Multimedia Comput Commun Appl 11(1s):17:1-17:23. https://doi.org/10.1145/2617994
- Gottlieb L, Choi J, Kelm P, Sikora T, Friedland G (2012) Pushing the limits of mechanical turk: qual- ifying the crowd for video geo-location. In: Proceedings of the ACM multimedia 2012 workshop on crowdsourcing for multimedia. ACM, pp 23-28
- Hardman L, Obrenović Ž, Nack F, Kerhervé B, Piersol K (2008) Canonical processes of semantically annotated media production. Multimedia Syst 14(6):327-340. https://doi.org/10.1007 /s00530-008-0134-0
- Kim S, Han J (2014) Text of white paper on mpeg-v. Tech. Rep ISO/IEC JTC 1/SC 29/WG 11 W14187, San Jose, USA
- Kim SK (2013) Authoring multisensorial content. Signal Process Image Commun 28(2):162-167. https://doi.org/10.1016/j.image.2012.10.011
- Kim SK, Yang SJ, Ahn CH, Joo YS (2014) Sensorial information extraction and mapping to generate temperature sensory effects. ETRI J 36(2):224-231. https://doi.org/10.4218/etrij.14.2113.0065
- Lasecki W, Miller C, Sadilek A, Abumoussa A, Borrello D, Kushalnagar R, Bigham J (2012) Real- time captioning by groups of non-experts. In: Proceedings of the 25th annual ACM symposium on User interface software and technology -UIST '12, UIST '12. ACM Press, New York, pp 23-33. https://doi.org/10.1145/2380116.2380122
- Masiar A, Simko J (2015) Short video metadata acquisition game. In: 10th international work- shop on semantic and social media adaptation and personalization (SMAP). IEEE, pp 61-65. https://doi.org/10.1109/SMAP.2015.7370092
- McNaney R, Othman M, Richardson D, Dunphy P, Amaral T, Miller N, Stringer H, Olivier P, Vines J (2016) Speeching: mobile crowdsourced speech assessment to support self-monitoring and management for people with parkinson's. In: Proceedings of the 2016 CHI conference on human factors in computing sys -CHI '16, CHI '16. ACM Press, New York, pp 4464-4476. https://doi.org/10.1145/2858036.2858321
- Murray N, Lee B, Qiao Y, Muntean GM (2016) The influence of human factors on olfaction based mulsemedia quality of experience. https://doi.org/10.1109/QoMEX.2016.7498975
- Neto FRA, Santos CAS (2018) Understanding crowdsourcing projects: a systematic review of tenden- cies, workflow, and quality management. Inf Process Manag 54(4):490-506. https://doi.org/10.1016 /j.ipm.2018.03.006
- Oh HW, Huh JD (2017) Auto generation system of mpeg-v motion sensory effects based on media scene. In: 2017 IEEE international conference on consumer electronics (ICCE). IEEE, pp 160-163. https://doi.org/10.1109/ICCE.2017.7889269
- Rainer B, Waltl M, Cheng E, Shujau M, Timmerer C, Davis S, Burnett I, Ritz C, Hellwagner H (2012) Investigating the impact of sensory effects on the quality of experience and emotional response in web videos. In: 4th international workshop on quality of multimedia experience (qoMEX). IEEE, pp 278- 283. https://doi.org/10.1109/QoMEX.2012.6263842
- Sadallah M, Aubert O, Prié Y (2014) Chm: an annotation-and component-based hypervideo model for the web. Multimed Tools Appl 70(2):869-903. https://doi.org/10.1007/s11042-012-1177-y
- Saleme EB, Celestrini JR, Santos CAS (2017) Time evaluation for the integration of a gestural interactive application with a distributed mulsemedia platform. In: Proceedings of the 8th ACM on multimedia sys- tems conference, MMSys'17. ACM, New York, pp 308-314. https://doi.org/10.1145/3083187.3084013
- Saleme EB, Santos CAS, Ghinea G (2018) Coping with the challenges of delivering multiple sensorial media. IEEE MultiMedia, 1-1. https://doi.org/10.1109/MMUL.2018.2873565
- Shin SH, Ha KS, Yun HO, Nam YS (2016) Realistic media authoring tool based on mpeg-v interna- tional standard. In: 2016 8th international conference on ubiquitous and future networks (ICUFN). IEEE, pp 730-732. https://doi.org/10.1109/ICUFN.2016.7537133
- Taborsky E, Allen K, Blanton A, Jain AK, Klare BF (2015) Annotating unconstrained face imagery: a scalable approach. In: International conference on biometrics (ICB). IEEE, pp 264-271. https://doi.org/10.1109/ICB.2015.7139094
- Teki S, Kumar S, Griffiths TD (2016) Large-scale analysis of auditory segregation behavior crowd- sourced via a smartphone app. PLos ONE, 11(4). https://doi.org/10.1371/journal.pone.015
- Timmerer C, Waltl M, Rainer B, Hellwagner H (2012) Assessing the quality of sensory experience for multimedia presentations. Signal Process Image Commun 27(8):909-916. https://doi.org/10.101 6/j.image.2012.01.016
- van Holthoon F, Olson D (1987) Common sense: the foundations for social science. Common sense. University Press of America, Lanham
- Waltl M, Rainer B, Timmerer C, Hellwagner H (2013) An end-to-end tool chain for sensory expe- rience based on mpeg-v. Signal Process Image Commun 28(2):136-150. https://doi.org/10.1016 /j.image.2012.10.009
- Waltl M, Timmerer C, Hellwagner H (2010) Improving the quality of multimedia experience through sensory effects. In: Second international workshop on quality of multimedia experience (qoMEX). IEEE, pp 124-129
- Waltl M, Timmerer C, Rainer B, Hellwagner H (2012) Sensory effect dataset and test setups. In: 4th international workshop on quality of multimedia experience (qoMEX). IEEE, pp 115-120. https://doi.org/10.1109/QoMEX.2012.6263841
- Yuan Z, Bi T, Muntean GM, Ghinea G (2015) Perceived synchronization of mulsemedia services. IEEE Trans Multimedia 17(7):957-966. https://doi.org/10.1109/TMM.2015.2431915
- Yue T, Wang H, Cheng S (2018) Learning from users: a data-driven method of qoe evaluation for internet video. MTAP: Multimedia Tools Appl, 1-32. https://doi.org/10.1007/s11042-018-5918-4
- Zhai H, Lingren T, Deleger L, Li Q, Kaiser M, Stoutenborough L, Solti I (2013) Web 2.0-based crowd- sourcing for high-quality gold standard development in clinical natural language processing. J Med Internet Res 15(4):1-17. https://doi.org/10.2196/jmir.2426
- Marcello Novaes de Amorim is a doctorate candidate in the Computer Science Department at Federal Uni- versity of Espírito Santo, Brazil. He received the graduation degree in Computer Science from UFES, Brazil, in 2005, and the M.Sc. degree in Computer Science from the Federal University of Espírito Santo, Brazil, in 2007. His research interests include multimedia systems, human computation and crowdsourcing. Contact him at novaes@inf.ufes.br.
- Est êv ão Bissoli Saleme is currently Ph.D. candidate in the Computer Science at Federal University of Espírito Santo (UFES), Brazil. From August 2018 and Februay 2019, he was an Academic Visitor at Brunel University London, UK. He received the B.Sc. degree in Information Systems from FAESA, Brazil, in 2008, and the M.Sc. degree in Computer Science from the UFES, in 2015. His current research interests include multimedia/mulsemedia systems, middlewares and frameworks, interactive multimedia, media transport and delivery. Contact him at estevaobissoli@gmail.com.
- F ábio Ribeiro de Assis Neto earned the B.Sc. and M.Sc. degrees in Computer Science from the Federal University of Espírito Santo (UFES), Brazil, in 2012 and 2017, respectively. His research interests include crowdsourcing, multimedia systems, and human computation. Contact him at fabio.ribeiro.neto@gmail.com.
- Dr. Celso A. S. Santos is a Professor in the Department of Informatics at Federal University of Espírito Santo (UFES), Brazil. He received the B.S. degree in Electrical Engineering from UFES in 1991, and the M.S. degree in Electrical Engineering (Electronic Systems) from the Polytechnic School of the Univer- sity of São Paulo, Brazil, in 1994. In 1999, he received his Dr. degree at Informatique Fondamentalle et Parallelisme from Universitè Paul Sabatier de Toulouse III, France. His recent research interests focus on multimedia/mulsemedia systems and applications, synchronization, and crowdsourcing systems. Contact him at saibel@inf.ufes.br.