Crowdsourcing authoring of sensory effects on videos

Marcello de Amorim

doi:10.1007/S11042-019-7312-2

Outline

Crowdsourcing authoring of sensory effects on videos

Marcello de Amorim

2019, Multimedia Tools and Applications

https://doi.org/10.1007/S11042-019-7312-2

visibility

…

description

27 pages

link

1 file

Abstract

Human perception is inherently multi-sensorial involving five traditional senses: sight, hearing, touch, taste, and smell. In contrast to traditional multimedia, based on audio and visual stimuli, mulsemedia seek to stimulate all the human senses. One way to produce multisensorial content is authoring videos with sensory effects. These effects are represented as metadata attached to the video content, which are processed and rendered through physical devices into the user's environment. However, creating sensory effects metadata is not a trivial activity because authors have to identify carefully different details in a scene such as the exact point where each effect starts, finishes, and also its presentation features such as intensity, direction, etc. It is a subjective task that requires accurate human perception and time. In this article, we aim at finding out whether a crowdsourcing approach is suitable for authoring coherent sensory effects associated with video content. Our belief is that the combination of a collective common sense to indicate time intervals of sensory effects with an expert fine-tuning is a viable way to generate sensory effects from the point of view of users. To carry out the experiment, we selected three videos from a public mulsemedia dataset, sent them to the crowd through a cascading microtask approach. The results showed that the crowd can indicate intervals in which users agree that there should be insertions of sensory effects, revealing a way of sharing authoring between the author and the crowd.

References (46)

Ademoye OA, Murray N, Muntean GM, Ghinea G (2016) Audio masking effect on inter-component skews in olfaction-enhanced multimedia presentations. ACM Trans Multimedia Comput Commun Appl 12(4):51:1-51:14. https://doi.org/10.1145/2957753
Amorim MN, Neto FRA, Santos CAS (2018) Achieving complex media annotation through collective wisdom and effort from the crowd. In: 2018 25th international conference on systems, signals and image processing (IWSSIP). IEEE, pp 1-5. https://doi.org/10.1109/IWSSIP.2018.8439402
Ballan L, Bertini M, Del Bimbo A, Seidenari L, Serra G (2011) Event detection and recognition for semantic annotation of video. Multimedia Tools Appl 51(1):279-302. https://doi.org/10.1007/s 11042-010-0643-7
Bartocci S, Betti S, Marcone G, Tabacchiera M, Zanuccoli F, Chiari A (2015) A novel multimedia- multisensorial 4d platform. In: AEIT International annual conference (AEIT), 2015. IEEE, pp 1-6. https://doi.org/10.1109/AEIT.2015.7415215
Chen J, Yao T, Chao H (2018) See and chat: automatically generating viewer-level comments on images. MTAP: Multimedia Tools Appl, 1-14. https://doi.org/10.1007/s11042-018-5746-6
Cho H (2010) Event-based control of 4d effects using mpeg rose. Master's thesis, School of Mechanical, Aerospace and Systems Engineering, Division of Mechanical Engineering. Korea Advanced Institute of Science and Technology. Master's Thesis
Choi B, Lee ES, Yoon K (2011) Streaming media with sensory effect. In: 2011 international con- ference on information science and applications (ICISA). IEEE, pp 1-6. https://doi.org/10.1109/IC ISA.2011.5772390
Chowdhury SN, Tandon N, Weikum G (2016) Know2look: commonsense knowledge for visual search. In: Proceedings of the 5th workshop on automated knowledge base construction, pp 57-62
Covaci A, Zou L, Tal I, Muntean GM, Ghinea G (2018) Is multimedia multisensorial?-a review of mulsemedia systems. ACM Comput Survey (CSUR) 51(5):91
Cross A, Bayyapunedi M, Ravindran D, Cutrell E, Thies W (2014) Vidwiki: enabling the crowd to improve the legibility of online educational videos. In: Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, pp 1167-1175
Di Salvo R, Spampinato C, Giordano D (2016) Generating reliable video annotations by exploit- ing the crowd. In: IEEE Winter conf. on applications of computer vision (WACV). IEEE, pp 1-8. https://doi.org/10.1109/WACV.2016.7477718
Dumitrache A, Aroyo L, Welty C, Sips RJ, Levas A (2013) A.: "dr. detective": combining gamification techniques and crowdsourcing to create a gold standard in medical text. 16-31
Egan D, Brennan S, Barrett J, Qiao Y, Timmerer C, Murray N (2016) An evaluation of heart rate and electrodermal activity as an objective qoe evaluation method for immersive virtual real- ity environments. In: 8th international conference on quality of multimedia experience (qoMEX'16). https://doi.org/10.1109/QoMEX.2016.7498964
Foncubierta Rodríguez A, Müller H (2012) Ground truth generation in medical imaging: a crowdsourcing-based iterative approach. In: Proceedings of the ACM multimedia 2012 workshop on crowdsourcing for multimedia, CrowdMM '12. ACM, New York, pp 9-14. https://doi.org/10.1145 /2390803.2390808
Galton F (1907) Vox populi (the wisdom of crowds). Nature 75(7):450-451
Ghinea G, Timmerer C, Lin W, Gulliver SR (2014) Mulsemedia: State of the art, per- spectives, and challenges. ACM Trans Multimedia Comput Commun Appl 11(1s):17:1-17:23. https://doi.org/10.1145/2617994
Gottlieb L, Choi J, Kelm P, Sikora T, Friedland G (2012) Pushing the limits of mechanical turk: qual- ifying the crowd for video geo-location. In: Proceedings of the ACM multimedia 2012 workshop on crowdsourcing for multimedia. ACM, pp 23-28
Hardman L, Obrenović Ž, Nack F, Kerhervé B, Piersol K (2008) Canonical processes of semantically annotated media production. Multimedia Syst 14(6):327-340. https://doi.org/10.1007 /s00530-008-0134-0
Kim S, Han J (2014) Text of white paper on mpeg-v. Tech. Rep ISO/IEC JTC 1/SC 29/WG 11 W14187, San Jose, USA
Kim SK (2013) Authoring multisensorial content. Signal Process Image Commun 28(2):162-167. https://doi.org/10.1016/j.image.2012.10.011
Kim SK, Yang SJ, Ahn CH, Joo YS (2014) Sensorial information extraction and mapping to generate temperature sensory effects. ETRI J 36(2):224-231. https://doi.org/10.4218/etrij.14.2113.0065
Lasecki W, Miller C, Sadilek A, Abumoussa A, Borrello D, Kushalnagar R, Bigham J (2012) Real- time captioning by groups of non-experts. In: Proceedings of the 25th annual ACM symposium on User interface software and technology -UIST '12, UIST '12. ACM Press, New York, pp 23-33. https://doi.org/10.1145/2380116.2380122
Masiar A, Simko J (2015) Short video metadata acquisition game. In: 10th international work- shop on semantic and social media adaptation and personalization (SMAP). IEEE, pp 61-65. https://doi.org/10.1109/SMAP.2015.7370092
McNaney R, Othman M, Richardson D, Dunphy P, Amaral T, Miller N, Stringer H, Olivier P, Vines J (2016) Speeching: mobile crowdsourced speech assessment to support self-monitoring and management for people with parkinson's. In: Proceedings of the 2016 CHI conference on human factors in computing sys -CHI '16, CHI '16. ACM Press, New York, pp 4464-4476. https://doi.org/10.1145/2858036.2858321
Murray N, Lee B, Qiao Y, Muntean GM (2016) The influence of human factors on olfaction based mulsemedia quality of experience. https://doi.org/10.1109/QoMEX.2016.7498975
Neto FRA, Santos CAS (2018) Understanding crowdsourcing projects: a systematic review of tenden- cies, workflow, and quality management. Inf Process Manag 54(4):490-506. https://doi.org/10.1016 /j.ipm.2018.03.006
Oh HW, Huh JD (2017) Auto generation system of mpeg-v motion sensory effects based on media scene. In: 2017 IEEE international conference on consumer electronics (ICCE). IEEE, pp 160-163. https://doi.org/10.1109/ICCE.2017.7889269
Rainer B, Waltl M, Cheng E, Shujau M, Timmerer C, Davis S, Burnett I, Ritz C, Hellwagner H (2012) Investigating the impact of sensory effects on the quality of experience and emotional response in web videos. In: 4th international workshop on quality of multimedia experience (qoMEX). IEEE, pp 278- 283. https://doi.org/10.1109/QoMEX.2012.6263842
Sadallah M, Aubert O, Prié Y (2014) Chm: an annotation-and component-based hypervideo model for the web. Multimed Tools Appl 70(2):869-903. https://doi.org/10.1007/s11042-012-1177-y
Saleme EB, Celestrini JR, Santos CAS (2017) Time evaluation for the integration of a gestural interactive application with a distributed mulsemedia platform. In: Proceedings of the 8th ACM on multimedia sys- tems conference, MMSys'17. ACM, New York, pp 308-314. https://doi.org/10.1145/3083187.3084013
Saleme EB, Santos CAS, Ghinea G (2018) Coping with the challenges of delivering multiple sensorial media. IEEE MultiMedia, 1-1. https://doi.org/10.1109/MMUL.2018.2873565
Shin SH, Ha KS, Yun HO, Nam YS (2016) Realistic media authoring tool based on mpeg-v interna- tional standard. In: 2016 8th international conference on ubiquitous and future networks (ICUFN). IEEE, pp 730-732. https://doi.org/10.1109/ICUFN.2016.7537133
Taborsky E, Allen K, Blanton A, Jain AK, Klare BF (2015) Annotating unconstrained face imagery: a scalable approach. In: International conference on biometrics (ICB). IEEE, pp 264-271. https://doi.org/10.1109/ICB.2015.7139094
Teki S, Kumar S, Griffiths TD (2016) Large-scale analysis of auditory segregation behavior crowd- sourced via a smartphone app. PLos ONE, 11(4). https://doi.org/10.1371/journal.pone.015
Timmerer C, Waltl M, Rainer B, Hellwagner H (2012) Assessing the quality of sensory experience for multimedia presentations. Signal Process Image Commun 27(8):909-916. https://doi.org/10.101 6/j.image.2012.01.016
van Holthoon F, Olson D (1987) Common sense: the foundations for social science. Common sense. University Press of America, Lanham
Waltl M, Rainer B, Timmerer C, Hellwagner H (2013) An end-to-end tool chain for sensory expe- rience based on mpeg-v. Signal Process Image Commun 28(2):136-150. https://doi.org/10.1016 /j.image.2012.10.009
Waltl M, Timmerer C, Hellwagner H (2010) Improving the quality of multimedia experience through sensory effects. In: Second international workshop on quality of multimedia experience (qoMEX). IEEE, pp 124-129
Waltl M, Timmerer C, Rainer B, Hellwagner H (2012) Sensory effect dataset and test setups. In: 4th international workshop on quality of multimedia experience (qoMEX). IEEE, pp 115-120. https://doi.org/10.1109/QoMEX.2012.6263841
Yuan Z, Bi T, Muntean GM, Ghinea G (2015) Perceived synchronization of mulsemedia services. IEEE Trans Multimedia 17(7):957-966. https://doi.org/10.1109/TMM.2015.2431915
Yue T, Wang H, Cheng S (2018) Learning from users: a data-driven method of qoe evaluation for internet video. MTAP: Multimedia Tools Appl, 1-32. https://doi.org/10.1007/s11042-018-5918-4
Zhai H, Lingren T, Deleger L, Li Q, Kaiser M, Stoutenborough L, Solti I (2013) Web 2.0-based crowd- sourcing for high-quality gold standard development in clinical natural language processing. J Med Internet Res 15(4):1-17. https://doi.org/10.2196/jmir.2426
Marcello Novaes de Amorim is a doctorate candidate in the Computer Science Department at Federal Uni- versity of Espírito Santo, Brazil. He received the graduation degree in Computer Science from UFES, Brazil, in 2005, and the M.Sc. degree in Computer Science from the Federal University of Espírito Santo, Brazil, in 2007. His research interests include multimedia systems, human computation and crowdsourcing. Contact him at novaes@inf.ufes.br.
Est êv ão Bissoli Saleme is currently Ph.D. candidate in the Computer Science at Federal University of Espírito Santo (UFES), Brazil. From August 2018 and Februay 2019, he was an Academic Visitor at Brunel University London, UK. He received the B.Sc. degree in Information Systems from FAESA, Brazil, in 2008, and the M.Sc. degree in Computer Science from the UFES, in 2015. His current research interests include multimedia/mulsemedia systems, middlewares and frameworks, interactive multimedia, media transport and delivery. Contact him at estevaobissoli@gmail.com.
F ábio Ribeiro de Assis Neto earned the B.Sc. and M.Sc. degrees in Computer Science from the Federal University of Espírito Santo (UFES), Brazil, in 2012 and 2017, respectively. His research interests include crowdsourcing, multimedia systems, and human computation. Contact him at fabio.ribeiro.neto@gmail.com.
Dr. Celso A. S. Santos is a Professor in the Department of Informatics at Federal University of Espírito Santo (UFES), Brazil. He received the B.S. degree in Electrical Engineering from UFES in 1991, and the M.S. degree in Electrical Engineering (Electronic Systems) from the Polytechnic School of the Univer- sity of São Paulo, Brazil, in 1994. In 1999, he received his Dr. degree at Informatique Fondamentalle et Parallelisme from Universitè Paul Sabatier de Toulouse III, France. His recent research interests focus on multimedia/mulsemedia systems and applications, synchronization, and crowdsourcing systems. Contact him at saibel@inf.ufes.br.

Crowdsourcing authoring of sensory effects on videos

Sign up for access to the world's latest research

Abstract

Related papers

References (46)

Related papers

Related topics