Real-time rendering of decorative sound textures for soundscapes
2020, ACM Transactions on Graphics
https://doi.org/10.1145/3414685.3417875Abstract
Audio recordings contain rich information about sound sources and their properties such as the location, loudness, and frequency of events. One prevalent component in sound recordings is the sound texture, which contains a massive number of events. In such a texture, there can be some distinct and repeated sounds that we term as a foreground sound. Birds chirping in the wind is one such decorative sound texture with the chirping as a foreground sound and the wind as a background texture. To render these decorative sound textures in real-time and with high quality, we create two-layer Markov Models to enable smooth transitions from sound grain to sound grain and propose a hierarchical scheme to generate Head-Related Transfer Function filters for localization cues of sounds represented as area/volume sources. Moreover, during the synthesis stage, we provide control over the frequency and intensity of sounds for customization. Lastly, foreground sounds are often blended into background...
References (54)
- V Ralph Algazi, Richard O Duda, Dennis M Thompson, and Carlos Avendano. 2001. The CIPIC HRTF database. In Applications of Signal Processing to Audio and Acoustics. IEEE, 2001 IEEE Workshop, 99-102.
- Durand R Begault and Leonard J Trejo. 2000. 3-D sound for virtual reality and multi- media. (2000).
- Juan Pablo Bello, Laurent Daudet, Samer Abdallah, Chris Duxbury, Mike Davies, and Mark B Sandler. 2005. A tutorial on onset detection in music signals. IEEE Transac- tions on speech and audio processing 13, 5 (2005), 1035-1047.
- Steven Boll. 1979. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on acoustics, speech, and signal processing 27, 2 (1979), 113-120.
- Joan Bruna and Stéphane Mallat. 2013. Audio texture synthesis with scattering moments. arXiv preprint arXiv:1311.0407 (2013).
- Nicholas Bryan and Gautham Mysore. 2013. An efficient posterior regularized latent variable model for interactive sound source separation. In International Conference on Machine Learning. 208-216.
- Chunxiao Cao, Zhong Ren, Carl Schissler, Dinesh Manocha, and Kun Zhou. 2016. Interactive sound propagation with bidirectional path tracing. ACM Transactions on Graphics (TOG) 35, 6 (2016), 1-11.
- Jeffrey N Chadwick and Doug L James. 2011. Animating fire with sound. In ACM Transactions on Graphics (TOG), Vol. 30. ACM, 84.
- Abe Davis and Maneesh Agrawala. 2018. Visual Rhythm and Beat. ACM Trans. Graph. 37, 4 (2018), 122-1.
- Antonio Di Crescenzo and Maria Longobardi. 2009. On cumulative entropies. Journal of Statistical Planning and Inference 139, 12 (2009), 4072-4087.
- Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An open urban driving simulator. arXiv preprint arXiv:1711.03938 (2017).
- Michael J Evans, James AS Angus, and Anthony I Tew. 1998. Analyzing head-related transfer function measurements using surface spherical harmonics. The Journal of the Acoustical Society of America 104, 4 (1998), 2400-2411.
- Raphael A. Finkel and Jon Louis Bentley. 1974. Quad trees a data structure for retrieval on composite keys. Acta informatica 4, 1 (1974), 1-9.
- Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia. 411-412.
- Fabio P Freeland, Luiz WP Biscainho, and Paulo SR Diniz. 2002. Efficient HRTF interpola- tion in 3D moving sound. In Audio Engineering Society Conference: 22nd International Conference: Virtual, Synthetic, and Entertainment Audio. Audio Engineering Society.
- Hannes Gamper. 2013. Head-related transfer function interpolation in azimuth, ele- vation, and distance. The Journal of the Acoustical Society of America 134, 6 (2013), EL547-EL553.
- Aki Härmä, Julia Jakka, Miikka Tikander, Matti Karjalainen, Tapio Lokki, Jarmo Hi- ipakka, and Gaëtan Lorho. 2004. Augmented reality audio for mobile and wearable appliances. Journal of the Audio Engineering Society 52, 6 (2004), 618-639.
- Toni Heittola, Annamaria Mesaros, Dani Korpi, Antti Eronen, and Tuomas Virtanen. 2014. Method for creating location-specific audio textures. EURASIP Journal on Audio, Speech, and Music Processing 2014, 1 (2014), 9.
- Alexander JE Kell and Josh H McDermott. 2019. Invariance to background noise as a signature of non-primary auditory cortex. Nature communications 10, 1 (2019), 1-11.
- Vivek Kwatra, Irfan Essa, Aaron Bobick, and Nipun Kwatra. 2005. Texture optimization for example-based synthesis. In ACM SIGGRAPH 2005 Papers. 795-802.
- Vivek Kwatra, Arno Schödl, Irfan Essa, Greg Turk, and Aaron Bobick. 2003. Graphcut textures: image and video synthesis using graph cuts. ACM Transactions on Graphics (ToG) 22, 3 (2003), 277-286.
- Wei-Hsiang Liao, Axel Roebel, and Alvin Su. 2013. On the modeling of sound textures based on the STFT representation. In Proc. of the 16th Int. Conference on Digital Audio Effects (DAFx-13). 33.
- Shiguang Liu, Haonan Cheng, and Yiying Tong. 2019. Physically-based statistical simulation of rain sound. ACM Transactions on Graphics (TOG) 38, 4 (2019), 123.
- Josh H McDermott and Eero P Simoncelli. 2011. Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis. Neuron 71, 5 (2011), 926-940.
- Brian McFee, Justin Salamon, and Juan Pablo Bello. 2018. Adaptive pooling operators for weakly labeled sound event detection. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 26, 11 (2018), 2180-2193.
- Ian McLoughlin, Haomin Zhang, Zhipeng Xie, Yan Song, and Wei Xiao. 2015. Robust sound event classification using deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, 3 (2015), 540-552.
- Lindasalwa Muda, Mumtaj Begam, and Irraivan Elamvazuthi. 2010. Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint arXiv:1003.4083 (2010).
- Sean O'Leary and Axel Roebel. 2014. A two level montage approach to sound texture synthesis with treatment of unique events.. In DAFx. 1-1.
- Seán O'Leary and Axel Röbel. 2016. A montage approach to sound texture synthesis. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 6 (2016), 1094-1105.
- Ashish Panda and Thambipillai Srikanthan. 2011. Psychoacoustic model compensation for robust speaker verification in environmental noise. IEEE transactions on audio, speech, and language processing 20, 3 (2011), 945-953.
- David R Perrott and Kourosh Saberi. 1990. Minimum audible angle thresholds for sources varying in both elevation and azimuth. The Journal of the Acoustical Society of America 87, 4 (1990), 1728-1731.
- Emil Praun, Adam Finkelstein, and Hugues Hoppe. 2000. Lapped textures. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques. 465- 470.
- Lawrence R Rabiner and Ronald W Schafer. 2011. Theory and applications of digital speech processing. Vol. 64. Pearson Upper Saddle River, NJ.
- Boaz Rafaely and Amir Avni. 2010. Interaural cross correlation in a sound field repre- sented by spherical harmonics. The Journal of the Acoustical Society of America 127, 2 (2010), 823-828.
- Nikunj Raghuvanshi, Rahul Narain, and Ming C Lin. 2009. Efficient and accurate sound propagation using adaptive rectangular decomposition. IEEE Transactions on Visualization and Computer Graphics 15, 5 (2009), 789-801.
- Nikunj Raghuvanshi and John Snyder. 2018. Parametric directional coding for pre- computed sound propagation. ACM Transactions on Graphics (TOG) 37, 4 (2018), 108. Curtis Roads. 1988. Introduction to granular synthesis. Computer Music Journal 12, 2 (1988), 11-13.
- Griffin D Romigh, Douglas S Brungart, Richard M Stern, and Brian D Simpson. 2015. Efficient real spherical harmonic representation of head-related transfer functions. IEEE Journal of Selected Topics in Signal Processing 9, 5 (2015), 921-930.
- Nicolas Saint-Arnaud and Kris Popat. 1995. Analysis and synthesis of sound textures. In in Readings in Computational Auditory Scene Analysis. Citeseer.
- Carl Schissler, Ravish Mehra, and Dinesh Manocha. 2014. High-order diffraction and diffuse reflections for interactive sound propagation in large environments. ACM Transactions on Graphics (TOG) 33, 4 (2014), 1-12.
- Carl Schissler, Aaron Nicholls, and Ravish Mehra. 2016. Efficient HRTF-based spa- tial audio for area and volumetric sources. IEEE transactions on visualization and computer graphics 22, 4 (2016), 1356-1366.
- Diemo Schwarz. 2011. State of the art in sound texture synthesis. In Digital Audio Effects (DAFx). 221-232.
- Diemo Schwarz and Baptiste Caramiaux. 2013. Interactive sound texture synthesis through semi-automatic user annotations. In International Symposium on Computer Music Multidisciplinary Research. Springer, 372-392.
- Mincheol Shin, Stephen W Song, Se Jung Kim, and Frank Biocca. 2019. The effects of 3D sound in a 360-degree live concert video on social presence, parasocial interaction, enjoyment, and intent of financial supportive action. International Journal of Human- Computer Studies 126 (2019), 81-93.
- Paris Smaragdis, Bhiksha Raj, and Madhusudana Shashanka. 2006. A probabilistic latent variable model for acoustic modeling. (2006).
- Martin Spiertz and Volker Gnann. 2009. Source-filter based clustering for monaural blind source separation. In Proceedings of the 12th International Conference on Digital Audio Effects.
- Yapeng Tian, Chenliang Xu, and Dingzeyu Li. 2019. Deep Audio Prior. ArXiv abs/1912.10292 (2019).
- Andries Van Der Merwe and Walter Schulze. 2010. Music generation with markov models. IEEE MultiMedia 18, 3 (2010), 78-85.
- Charles Verron, Mitsuko Aramaki, Richard Kronland-Martinet, and Grégory Pallone. 2009. Spatialized synthesis of noisy environmental sounds. In Auditory Display. Springer, 392-407.
- Jui-Hsien Wang, Ante Qu, Timothy R Langlois, and Doug L James. 2018. Toward wave-based sound synthesis for computer animation. ACM Trans. Graph. 37, 4 (2018), 109-1.
- Stephan Wenger and Marcus Magnor. 2011. Constrained example-based audio synthesis. In 2011 IEEE International Conference on Multimedia and Expo. IEEE, 1-6.
- Zechen Zhang, Nikunj Raghuvanshi, John Snyder, and Steve Marschner. 2018. Ambient sound propagation. ACM Transactions on Graphics (TOG) 37, 6 (2018), 1-10.
- Zechen Zhang, Nikunj Raghuvanshi, John Snyder, and Steve Marschner. 2019. Acoustic texture rendering for extended sources in complex scenes. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1-9.
- Changxi Zheng and Doug L James. 2009. Harmonic fluids. In ACM Transactions on Graphics (TOG), Vol. 28. ACM, 37.
- Xinglei Zhu and Lonce Wyse. 2004. Sound texture modeling and time-frequency LPC. In Proceedings of the 7th international conference on digital audio effects DAFX, Vol. 4.