Deep Prior Approach for Room Impulse Response Reconstruction
Sensors
https://doi.org/10.3390/S22072710Abstract
In this paper, we propose a data-driven approach for the reconstruction of unknown room impulse responses (RIRs) based on the deep prior paradigm. We formulate RIR reconstruction as an inverse problem. More specifically, a convolutional neural network (CNN) is employed prior, in order to obtain a regularized solution to the RIR reconstruction problem for uniform linear arrays. This approach allows us to avoid assumptions on sound wave propagation, acoustic environment, or measuring setting made in state-of-the-art RIR reconstruction algorithms. Moreover, differently from classical deep learning solutions in the literature, the deep prior approach employs a per-element training. Therefore, the proposed method does not require training data sets, and it can be applied to RIRs independently from available data or environments. Results on simulated data demonstrate that the proposed technique is able to provide accurate results in a wide range of scenarios, including variable direction ...
References (58)
- Tohyama, M.; Koike, T. (Eds.) Transfer Function and Frequency Response Function. In Fundamentals of Acoustic Signal Processing; Academic Press: London, UK, 1998; pp. 75-110. [CrossRef]
- Nelson, P.A.; Elliott, S.J. Active Control of Sound; Academic Press: New York, NY, USA, 1991.
- Cobos, M.; Antonacci, F.; Alexandridis, A.; Mouchtaris, A.; Lee, B. A survey of sound source localization methods in wireless acoustic sensor networks. Wirel. Commun. Mob. Comput. 2017, 2017, 3956282. [CrossRef]
- Gannot, S.; Vincent, E.; Markovich-Golan, S.; Ozerov, A. A consolidated perspective on multimicrophone speech enhancement and source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 692-730. [CrossRef]
- Pezzoli, M.; Carabias-Orti, J.J.; Cobos, M.; Antonacci, F.; Sarti, A. Ray-Space-Based Multichannel Nonnegative Matrix Factorization for Audio Source Separation. IEEE Signal Process. Lett. 2021, 28, 369-373. [CrossRef]
- Tylka, J.G.; Choueiri, E.Y. Fundamentals of a parametric method for virtual navigation within an array of ambisonics microphones. J. Audio Eng. Soc. 2020, 68, 120-137. [CrossRef]
- Rife, D.D.; Vanderkooy, J. Transfer-function measurement with maximum-length sequences. J. Audio Eng. Soc. 1989, 37, 419-444.
- Farina, A. Advancements in Impulse Response Measurements by Sine Sweeps. In Audio Engineering Society Convention 122; Audio Engineering Society: Vienna, Austria, 2007. Available online: http://www.aes.org/e-lib/browse.cfm?elib=14106 (accessed on 28 March 2022).
- Stan, G.B.; Embrechts, J.J.; Archambeau, D. Comparison of different impulse response measurement techniques. J. Audio Eng. Soc. 2002, 50, 249-262.
- Ajdler, T.; Sbaiz, L.; Vetterli, M. Dynamic measurement of room impulse responses using a moving microphone. J. Acoust. Soc. Am. 2007, 122, 1636-1645. [CrossRef]
- Thiergart, O.; Del Galdo, G.; Taseska, M.; Habets, E.A.P. Geometry-based spatial sound acquisition using distributed microphone arrays. IEEE Trans. Audio Speech Lang. Process. 2013, 21, 2583-2594. [CrossRef]
- Pezzoli, M.; Borra, F.; Antonacci, F.; Sarti, A.; Tubaro, S. Estimation of the Sound Field at Arbitrary Positions in Distributed Microphone Networks Based on Distributed Ray Space Transform. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15-20 April 2018; pp. 186-190.
- Pezzoli, M.; Borra, F.; Antonacci, F.; Sarti, A.; Tubaro, S. Reconstruction of the Virtual Microphone Signal Based on the Distributed Ray Space Transform. In Proceedings of the 26th European Signal Processing Conference (EUSIPCO), Rome, Italy, 3-7 September 2018; pp. 1537-1541.
- Pezzoli, M.; Borra, F.; Antonacci, F.; Tubaro, S.; Sarti, A. A parametric approach to virtual miking for sources of arbitrary directivity. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 2333-2348. [CrossRef]
- Pulkki, V.; Delikaris-Manias, S.; Politis, A. Parametric Time-Frequency Domain Spatial Audio; Wiley Online Library: Hoboken, NJ, USA, 2018.
- Das, O.; Calamia, P.; Gari, S.V.A. Room Impulse Response Interpolation from a Sparse Set of Measurements Using a Modal Architecture. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6-11 June 2021; pp. 960-964.
- Haneda, Y.; Kaneda, Y.; Kitawaki, N. Common-acoustical-pole and residue model and its application to spatial interpolation and extrapolation of a room transfer function. IEEE Trans. Speech Audio Process. 1999, 7, 709-717. [CrossRef]
- Koyama, S.; Daudet, L. Sparse Representation of a Spatial Sound Field in a Reverberant Environment. IEEE J. Sel. Top. Signal Process. 2019, 13, 172-184. [CrossRef]
- Damiano, S.; Borra, F.; Bernardini, A.; Antonacci, F.; Sarti, A. Soundfield reconstruction in reverberant rooms based on compressive sensing and image-source models of early reflections. In Proceedings of the 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 17-20 October 2021.
- Zea, E. Compressed sensing of impulse responses in rooms of unknown properties and contents. J. Sound Vib. 2019, 459, 114871. [CrossRef]
- Antonello, N.; De Sena, E.; Moonen, M.; Naylor, P.A.; Van Waterschoot, T. Room impulse response interpolation using a sparse spatio-temporal representation of the sound field. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 1929-1941. [CrossRef]
- Borra, F.; Gebru, I.D.; Markovic, D. Soundfield reconstruction in reverberant environments using higher-order microphones and impulse response measurements. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12-17 May 2019; pp. 281-285.
- Borra, F.; Krenn, S.; Gebru, I.D.; Marković, D. 1st-order microphone array system for large area sound field recording and reconstruction: Discussion and preliminary results. In Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 20-23 October 2019; pp. 378-382.
- Birnie, L.; Abhayapala, T.; Tourbabin, V.; Samarasinghe, P. Mixed Source Sound Field Translation for Virtual Binaural Application With Perceptual Validation. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 1188-1203. [CrossRef]
- Mignot, R.; Chardon, G.; Daudet, L. Low frequency interpolation of room impulse responses using compressed sensing. IEEE/ACM Trans. Audio Speech Lang. Process. 2013, 22, 205-216. [CrossRef]
- Jin, W.; Kleijn, W.B. Theory and design of multizone soundfield reproduction using sparse methods. IEEE/ACM Trans. Audio Speech Lang. Process. 2015, 23, 2343-2355.
- Williams, E.G. Fourier Acoustics; Academic Press: London, UK, 1999.
- Fahim, A.; Samarasinghe, P.N.; Abhayapala, T.D. Sound field separation in a mixed acoustic environment using a sparse array of higher order spherical microphones. In Proceedings of the 2017 Hands-Free Speech Communications and Microphone Arrays (HSCMA), San Francisco, CA, USA, 1-3 March 2017; pp. 151-155.
- Pezzoli, M.; Cobos, M.; Antonacci, F.; Sarti, A. Sparsity-Based Sound Field Separation in The Spherical Harmonics Domain. In Proceedings of the Accepted for IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 22-27 May 2022.
- Lee, S. Review: The Use of Equivalent Source Method in Computational Acoustics. J. Comput. Acoust. 2016, 25, 1630001. [CrossRef]
- Donoho, D.L. Compressed sensing. IEEE Trans. Inf. Theory 2006, 52, 1289-1306. [CrossRef]
- Herrmann, F.J.; Hennenfent, G. Non-parametric seismic data recovery with curvelet frames. Geophys. J. Int. 2008, 173, 233-248.
- Labate, D.; Lim, W.Q.; Kutyniok, G.; Weiss, G. Labate, D.; Lim, W.Q.; Kutyniok, G.; Weiss, G. Sparse Multidimensional Representation Using Shearlets. In Proceedings of the Wavelets XI, International Society for Optics and Photonics, San Diego, CA, USA, 31 July-4 August 2005; Volume 5914, p. 59140U.
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. Available online: http: //www.deeplearningbook.org (accessed on 28 March 2022).
- Olivieri, M.; Pezzoli, M.; Antonacci, F.; Sarti, A. A Physics-Informed Neural Network Approach for Nearfield Acoustic Holography. Sensors 2021, 21, 7834. [CrossRef]
- Bianco, M.J.; Gerstoft, P.; Traer, J.; Ozanich, E.; Roch, M.A.; Gannot, S.; Deledalle, C.A. Machine learning in acoustics: Theory and applications. J. Acoust. Soc. Am. (JASA) 2019, 146, 3590-3628. [CrossRef] [PubMed]
- Olivieri, M.; Malvermi, R.; Pezzoli, M.; Zanoni, M.; Gonzalez, S.; Antonacci, F.; Sarti, A. Audio Information Retrieval and Musical Acoustics. IEEE Instrum. Meas. Mag. 2021, 24, 10-20. [CrossRef]
- Olivieri, M.; Pezzoli, M.; Malvermi, R.; Antonacci, F.; Sarti, A. Near-field Acoustic Holography analysis with Convolutional Neural Networks. In Proceedings of the INTER-NOISE and NOISE-CON Congress and Conference Proceedings, Seoul, Korea, 23-26 August 2020; Volume 261, pp. 5607-5618.
- Campagnoli, C.; Pezzoli, M.; Antonacci, F.; Sarti, A. Vibrational modal shape interpolation through convolutional auto encoder. In Proceedings of the INTER-NOISE and NOISE-CON Congress and Conference Proceedings, Seoul, Korea, 23-26 August 2020; Volume 261, pp. 5619-5626.
- Lluís, F.; Martínez-Nuevo, P.; Bo Møller, M.; Ewan Shepstone, S. Sound field reconstruction in rooms: Inpainting meets super-resolution. J. Acoust. Soc. Am. 2020, 148, 649-659. [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234-241.
- Bertalmio, M.; Sapiro, G.; Caselles, V.; Ballester, C. Image inpainting. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, LA, USA, 23-28 July 2000; pp. 417-424.
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced Deep Residual Networks for Single Image Super-Resolution. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21-26 July 2017; pp. 1132-1140.
- Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18-23 June 2018; pp. 9446-9454.
- Dittmer, S.; Kluth, T.; Maass, P.; Baguer, D.O. Regularization by architecture: A deep prior approach for inverse problems. J. Math. Imaging Vis. 2020, 62, 456-470. [CrossRef]
- Kong, F.; Lipari, V.; Picetti, F.; Bestagini, P.; Tubaro, S. A deep prior convolutional autoencoder for seismic data interpolation. In Proceedings of the EAGE 2020 Annual Conference & Exhibition Online, European Association of Geoscientists & Engineers, Online, 8-11 December 2020; pp. 1-5.
- Picetti, F.; Lipari, V.; Bestagini, P.; Tubaro, S. Anti-Aliasing Add-On for Deep Prior Seismic Data Interpolation. arXiv 2021, arXiv:2101.11361.
- Kong, F.; Picetti, F.; Lipari, V.; Bestagini, P.; Tang, X.; Tubaro, S. Deep Prior-Based Unsupervised Reconstruction of Irregularly Sampled Seismic Data. IEEE Geosci. Remote Sens. Lett. 2020, 19, 7501305. [CrossRef]
- Malvermi, R.; Antonacci, F.; Sarti, A.; Corradi, R. Prediction of Missing Frequency Response Functions through Deep Image Prior. In Proceedings of 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 17-20 October 2021.
- Michelashvili, M.; Wolf, L. Audio denoising with deep network priors. arXiv 2019, arXiv:1904.07612
- Ibtehaz, N.; Rahman, M.S. MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Netw. 2020, 121, 74-87. [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7-12 June 2015; pp. 1-9. [CrossRef]
- Maas, A.L.; Hannun, A.Y.; Ng, A.Y. Rectifier nonlinearities improve neural network acoustic models. Proc. ICML Citeseer 2013, 30, 3.
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32; Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 8024-8035.
- Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization, 3rd International Conference on Learning Representations. arXiv 2014, arXiv:1412.6980.
- Pezzoli, M.; Comanducci, L.; Waltz, J.; Agnello, A.; Bondi, L.; Canclini, A.; Sarti, A. A Dante Powered Modular Microphone Array System. In Proceedings of the Audio Engineering Society Convention 145, Audio Engineering Society, New York, NY, USA, 17-20 October 2018. Available online: http://www.aes.org/e-lib/browse.cfm?elib=19743 (accessed on 28 March 2022).
- Gunda, R.; Vijayakar, S.; Singh, R. Method of images for the harmonic response of beams and rectangular plates. J. Sound Vib. 1995, 185, 791-808. [CrossRef]
- Scheibler, R.; Bezzam, E.; Dokmanić, I. Pyroomacoustics: A python package for audio room simulation and array processing algorithms. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15-20 April 2018; pp. 351-355.