Audio source separation into the wild

Sharon Gannot

doi:10.1016/B978-0-12-814601-9.00022-5

Outline

Audio source separation into the wild

Sharon Gannot

2019, Elsevier eBooks

https://doi.org/10.1016/B978-0-12-814601-9.00022-5

visibility

…

description

30 pages

link

1 file

Abstract

This review chapter is dedicated to multichannel audio source separation in real-life environment. We explore some of the major achievements in the field and discuss some of the remaining challenges. We will explore several important practical scenarios, e.g. moving sources and/or microphones, varying number of sources and sensors, high reverberation levels, spatially diffuse sources, and synchronization problems. Several applications such as smart assistants, cellular phones, hearing aids and robots, will be discussed. Our perspectives on the future of the field will be given as concluding remarks of this chapter.

References (150)

R. Aichner, H. Buchner, S. Araki, and S. Makino. On-line time-domain blind source separation of nonstationary convolved signals. In Int. Conf. Independent Component Analysis and Blind Source Separation (ICA), Nara, Japan, 2003.
X. Anguera Miro, S. Bozonnet, N. Evans, C. Fredouille, G. Friedland, and O. Vinyals. Speaker diarization: A review of recent research. IEEE Transactions on Audio, Speech, and Language Processing, 20(2):356-371, 2012.
S. Araki, R. Mukai, S. Makino, T. Nishikawa, and H. Saruwatari. The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech. IEEE Transactions on Speech and Audio Processing, 11(2):109-116, 2003.
S. Araki, H. Sawada, R. Mukai, and S. Makino. Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors. Signal Processing, 87(8):1833-1847, 2007.
S. Arberet, R. Gribonval, and F. Bimbot. A robust method to count and locate audio sources in a multichannel underdetermined mixture. IEEE Transactions on Signal Processing, 58(1):121-133, 2010.
S. Arberet, A. Ozerov, N. Q. K. Duong, E. Vincent, R. Gribonval, F. Bim- bot, and P. Vandergheynst. Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separa- tion. In IEEE International Symposium on Signal Processing and Its Ap- plications (ISSPA), Kuala Lumpur, Malaysia, 2010.
S. Arberet, P. Vandergheynst, R. Carrillo, J-P. Thiran, and Y. Wiaux. Sparse reverberant audio source separation via reweighted analysis. IEEE Transactions on Audio, Speech, and Language Processing, 21(7):1391- 1402, 2013.
H. Attias. New EM algorithms for source separation and deconvolution with a microphone array. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2003.
Y. Avargel and I. Cohen. On multiplicative transfer function approxima- tion in the short-time Fourier transform domain. IEEE Signal Processing Letters, 14(5):337-340, 2007.
Y. Avargel and I. Cohen. System identification in the short-time Fourier transform domain with crossband filtering. IEEE Transactions on Audio, Speech, and Language Processing, 15(4):1305-1319, 2007.
R. Badeau and M.D. Plumbley. Multichannel high-resolution NMF for modeling convolutive mixtures of non-stationary signals in the time- frequency domain. IEEE/ACM Transactions on Audio, Speech, and Lan- guage Processing, 22(11):1670-1680, 2014.
L. Benaroya, L.M. Donagh, F. Bimbot, and R. Gribonval. Non negative sparse representation for Wiener based source separation with a single sen- sor. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2003.
J. Benesty, J. Chen, and Y. Huang. Microphone array signal processing. Springer, 2008.
J. Benesty, S. Makino, and J. Chen, editors. Speech Enhancement. Springer, 2005.
A. Bertrand and M. Moonen. Distributed adaptive node-specific signal estimation in fully connected sensor networks -part I: sequential node updating. IEEE Transactions on Signal Processing, 58:5277-5291, 2010.
A. Bertrand and M. Moonen. Distributed node-specific LCMV beamform- ing in wireless sensor networks. IEEE Transactions on Signal Processing, 60:233-246, 2012.
C. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
M. S. Brandstein and D. B. Ward, editors. Microphone Arrays: Signal Processing Techniques and Applications. Springer, 2001.
G. Bustamante, P. Danès, T. Forgue, A. Podlubne, and J. Manhes. An information based feedback control for audio-motor binaural localization. Autonomous Robots, 42(2):477-490, 2018.
J.F. Cardoso. Blind signal separation: Statistical principles. Proceedings of the IEEE, 9(10):2009-2025, 1998.
A.T. Cemgil, C. Févotte, and S. Godsill. Variational and stochas- tic inference for Bayesian source separation. Digital Signal Processing, 2007(17):891-913, 2007.
S. E. Chazan, J. Goldberger, and S. Gannot. A hybrid approach for speech enhancement using MoG model and neural network phoneme classifier. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(12), December 2016.
S. E. Chazan, J. Goldberger, and S. Gannot. DNN-based concurrent speakers detector and its application to speaker extraction with LCMV beamforming. In IEEE International Conference on Audio and Acoustic Signal Processing (ICASSP), Calgary, Alberta, Canada, 2018.
D. Cherkassky and S. Gannot. Blind synchronization in wireless acoustic sensor networks. IEEE/ACM Transactions on Audio, Speech, and Lan- guage Processing, 25(3):651-661, March 2017.
I. Cohen, J. Benesty, and S. Gannot, editors. Speech processing in modern communication: Challenges and perspectives. Springer, 2010.
P. Comon and C. Jutten, editors. Handbook of Blind Source Separation -Independent Component Analysis and Applications. Academic Press, 2010.
R. K. Cook, R.V. Waterhouse, R.D. Berendt, S. Edelman, and M.C. Thompson Jr. Measurement of correlation coefficients in reverber- ant sound fields. The Journal of the Acoustical Society of America, 27(6):1072-1077, 1955.
H. Cox. Spatial correlation in arbitrary noise fields with application to ambient sea noise. The Journal of the Acoustical Society of America, 54(5):1289-1301, 1973.
R.E. Crochiere and L.R. Rabiner. Multi-Rate Signal Processing. Engle- wood Cliffs, NJ: Prentice Hall, 1983.
A.P. Dempster, N.M. Laird, D.B. Rubin, et al. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal statistical Society, 39(1):1-38, 1977.
J.H. DiBiase, H.F. Silverman, and M.S. Brandstein. Robust localization in reverberant rooms. In Microphone Arrays, pages 157-180. Springer, 2001.
H. Dillon. Hearing Aids. Thieme, 2012.
P. Divenyi, editor. Speech separation by Humans and machines. Springer Verlag, 2004.
S. Doclo and M. Moonen. GSVD-based optimal filtering for single and multimicrophone speech enhancement. IEEE Transactions on Signal Pro- cessing, 50(9):2230-2244, 2002.
S. Doclo, A. Spriet, J. Wouters, and M. Moonen. Speech distortion weighted multichannel Wiener filtering techniques for noise reduction. In Speech Enhancement, Signals and Communication Technology, pages 199- 228. Springer, Berlin, 2005.
L. Drude, A. Chinaev, D.H. Tran Vu, and R. Haeb-Umbach. Source counting in speech mixtures using a variational EM approach for complex Watson mixture models. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, Italy, 2014.
N. Duong, E. Vincent, and R. Gribonval. Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Transactions on Audio, Speech, and Language Processing, 18(7):1830- 1840, 2010.
N. Duong, E. Vincent, and R. Gribonval. Spatial location priors for Gaus- sian model based reverberant audio source separation. EURASIP Journal on Advances in Signal Processing, 2013(149), 2013.
T.G. Dvorkind and S. Gannot. Time difference of arrival estimation of speech source in a noisy and reverberant environment. Signal Processing, 85(1):177-204, 2005.
C. Evers, Y. Dorfan, S. Gannot, and P.A. Naylor. Source tracking us- ing moving microphone arrays for robot audition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, Louisiana, 2017.
C. Evers, A.H. Moore, and P.A. Naylor. Localization of moving micro- phone arrays from moving sound sources for robot audition. In European Signal Processing Conference (EUSIPCO), Budapest, Hungary, 2016.
M.F. Fallon and S.J. Godsill. Acoustic source localization and tracking of a time-varying number of speakers. IEEE Transactions on Audio, Speech, and Language Processing, 20(4):1409-1415, 2012.
F. Feng. Séparation aveugle de sources: de l'instantané au convolutif. Ph.D. thesis, Université Paris Sud, 2017.
C. Févotte, N. Bertin, and J.-L. Durrieu. Nonnegative matrix factorization with the Itakura-Saito divergence. With application to music analysis. Neural Computation, 21(3):793-830, 2009.
C. Févotte and J.-F. Cardoso. Maximum likelihood approach for blind audio source separation using time-frequency Gaussian source models. In IEEE Workshop Applicat. Signal Process. to Audio and Acoust. (WAS- PAA), New Paltz, NJ, 2005.
S. Gannot, D. Burshtein, and E. Weinstein. Signal enhancement us- ing beamforming and nonstationarity with applications to speech. IEEE Transactions on Signal Processing, 49(8):1614-1626, 2001.
S. Gannot and M. Moonen. On the application of the unscented Kalman filter to speech processing. In IEEE Int. Workshop on Acoustic Echo and Noise Control (IWAENC), Kyoto, Japan, 2003.
S. Gannot, E. Vincent, S. Markovich-Golan, and A. Ozerov. A consol- idated perspective on multimicrophone speech enhancement and source separation. IEEE/ACM Trans. Audio, Speech, Lang. Process., 25(4):692- 730, 2017.
S. L. Gay and J. Benesty, editors. Acoustic signal processing for telecom- munication. Kluwer, 2000.
A. Gilloire and M. Vetterli. Adaptive filtering in subbands with critical sampling: analysis, experiments, and application to acoustic echo cancel- lation. IEEE Transactions on Signal Processing, 40(8):1862-1875, 1992.
E. Girgis, G. Roma, A. Simpson, and M. Plumbley. Combining mask estimates for single channel audio source separation using deep neural networks. Conference of the International Speech Communication Asso- ciation (INTERSPEECH), 2016.
L. Girin and R. Badeau. On the use of latent mixing filters in audio source separation. In International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), Grenoble, France, 2017.
E. Habets and S. Gannot. Generating sensor signals in isotropic noise fields. The Journal of the Acoustical Society of America, 122:3464-3470, 2007.
E. Hadad, S. Doclo, and S. Gannot. The binaural LCMV beamformer and its performance analysis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(3):543-558, 2016.
J.R. Hershey, Z. Chen, J. Le Roux, and S. Watanabe. Deep cluster- ing: Discriminative embeddings for segmentation and separation. In IEEE International Conference on Acoustics, Speech, and Signal Process- ing (ICASSP), Shanghai, China, 2016.
T. Higuchi and H. Kameoka. Joint audio source separation and dere- verberation based on multichannel factorial hidden Markov model. In IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2014.
T. Higuchi, N. Takamune, N. Tomohiko, and H. Kameoka. Underde- termined blind separation and tracking of moving sources based on DOA- HMM. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, Italy, 2014.
T. Higuchi, H. Takeda, N. Tomohiko, and H. Kameoka. A unified approach for underdetermined blind signal separation and source activity detec- tion by multichannel factorial hidden Markov models. In Conference of the International Speech Communication Association (INTERSPEECH), Singapore, 2014.
K.E. Hild II, D. Erdogmus, and J.C. Principe. Blind source separation of time-varying, instantaneous mixtures using an on-line algorithm. In IEEE International Conference on Acoustics, Speech, and Signal Process- ing (ICASSP), Orlando, Florida, 2002.
T. Hori, Z. Chen, H. Erdogan, J.R. Hershey, J. Le Roux, V. Mitra, and S. Watanabe. Multi-microphone speech recognition integrating beamform- ing, robust feature extraction, and advanced NN/RNN backend. Com- puter Speech & Language, 46:401-418, 2017.
A. Hyvärinen, J. Karhunen, and E. Oja, editors. Independent Component Analysis. Wiley and Sons, 2001.
M. Z. Ikram and D. R. Morgan. A beamformer approach to permutation alignment for multichannel frequency-domain blind source separation. In IEEE International Conference on Acoustics, Speech and Signal Process- ing (ICASSP), Orlando, Florida, 2002.
A. H. Kamkar-Parsi and M. Bouchard. Instantaneous binaural target PSD estimation for hearing aid noise reduction in complex acoustic en- vironments. IEEE Transactions on Instrumentation and Measurments, 60(4):1141-1154, 2011.
B. Kleijn and F. Lim. Robust and low-complexity blind source separation for meeting rooms. In Int. Conf. on Hands-free Speech Communication and Microphone Arrays, San Francisco, CA, 2017.
D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, and R. Horaud. A variational EM algorithm for the separation of moving sound sources. In IEEE Workshop Applicat. Signal Process. to Audio and Acoust. (WASPAA), New Paltz, NJ, 2015.
D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, and R. Horaud. An inverse-Gamma source variance prior with factorized pa- rameterization for audio source separation. In IEEE International Con- ference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 2016.
D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, and R. Horaud. A variational EM algorithm for the separation of time-varying convolutive audio mixtures. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(8):1408-1423, 2016.
D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, and R. Horaud. An EM algorithm for joint source separation and diariza- tion of multichannel convolutive speech mixtures. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, Louisiana, 2017.
D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, and R. Horaud. Exploiting the intermittency of speech for joint separation and diarization. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NJ, 2017.
A. Koutras, E. Dermatas, and G. Kokkinakis. Blind speech separation of moving speakers in real reverberant environments. In IEEE Interna- tional Conference on Acoustics, Speech, and Signal Processing (ICASSP), Istanbul, Turkey, 2000.
M. Kowalski, E. Vincent, and R. Gribonval. Beyond the narrowband approximation: Wideband convex methods for under-determined rever- berant audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 18(7):1818-1829, 2010.
H. Kuttruff. Room acoustics. Taylor & Francis, 2000.
Y. Laufer and S. Gannot. A Bayesian hierarchical model for speech en- hancement. In IEEE International Conference on Audio and Acoustic Signal Processing (ICASSP), Calgary, Alberta, Canada, 2018.
D.D. Lee and H.S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788-791, 1999.
S. Leglaive, R. Badeau, and G. Richard. Multichannel audio source sep- aration with probabilistic reverberation priors. IEEE Transactions on Audio, Speech and Language Processing, 24(12), 2016.
S. Leglaive, R. Badeau, and G. Richard. Multichannel audio source sepa- ration: Variational inference of time-frequency sources from time-domain observations. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), New-Orleans, Louisiana, 2017.
S. Leglaive, R. Badeau, and G. Richard. Separating time-frequency sources from time-domain convolutive mixtures using non-negative matrix factorization. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, 2017.
B. Li, T. N. Sainath, A. Narayanan, J. Caroselli, M. Bacchiani, A. Misra, I. Shafran, H. Sak, G. Pundak, K. Chin, K. C. Sim, R. J. Weiss, K. W. Wilson, E. Variani, C. Kim, O. Siohan, M. Weintraub, E. McDermott, R. Rose, and M. Shannon. Acoustic modeling for Google Home. In Con- ference of the International Speech Communication Association (INTER- SPEECH), Stockholm, Sweden, 2017.
X. Li, L. Girin, and R. Horaud. Audio source separation based on con- volutive transfer function and frequency-domain Lasso optimization. In IEEE International Conference on Acoustics, Speech and Signal Process- ing (ICASSP), New Orleans, Louisiana, 2017.
X. Li, L. Girin, and R. Horaud. An EM algorithm for audio source sep- aration based on the convolutive transfer function. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, 2017.
X. Li, L. Girin, R. Horaud, and S. Gannot. Multiple-speaker localization based on direct-path features and likelihood maximization with spatial sparsity regularization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(10):1007-2012, 2017.
A. Liutkus, B. Badeau, and G. Richard. Gaussian processes for under- determined source separation. IEEE Transactions on Signal Processing, 59(7):3155-3167, 2011.
B. Loesch and B. Yang. Online blind source separation based on time- frequency sparseness. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, 2009.
P. C. Loizou. Speech Enhancement: Theory and Practice. CRC Press, 2007.
H. Löllmann, A. Moore, P. Naylor, B. Rafaely, R. Horaud, A. Mazel, and W. Kellermann. Microphone array signal processing for robot audition. In IEEE Int. Conf. on Hands-free Speech Communications and Microphone Arrays (HSCMA), San Francisco, CA, 2017.
Y. Luo, Z. Chen, J. R. Hershey, J. Le Roux, and N. Mesgarani. Deep clustering and conventional networks for music separation: Stronger to- gether. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, Lousiana, 2017.
R. F. Lyon. Human and Machine Hearing: Extracting Meaning from Sound. Cambridge University Press, 2017.
N. Ma, T. May, and G. J. Brown. Exploiting deep neural networks and head movements for robust binaural localization of multiple sources in reverberant environments. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(12):2444-2453, 2017.
S. Makino, T.-W. Lee, and H. Sawada, editors. Blind speech separation. Springer, 2007.
S. Malik, J. Benesty, and J. Chen. A Bayesian framework for blind adap- tive beamforming. IEEE Transactions on Signal Processing, 62(9):2370- 2384, 2014.
M. Mandel, R. J. Weiss, and D.P.W. Ellis. Model-based expectation- maximization source separation and localization. IEEE Transactions on Audio, Speech, and Language Processing, 18(2):382-394, 2010.
S. Markovich, S. Gannot, and I. Cohen. Multichannel eigenspace beam- forming in a reverberant noisy environment with multiple interfering speech signals. IEEE Transactions on Audio, Speech, and Language Pro- cessing, 17(6):1071-1086, 2009.
S. Markovich-Golan, A. Bertrand, M. Moonen, and S. Gannot. Opti- mal distributed minimum-variance beamforming approaches for speech en- hancement in wireless acoustic sensor networks. Signal Processing, 107:4- 20, 2015.
S. Markovich-Golan, S Gannot, and I. Cohen. Subspace tracking of mul- tiple sources and its application to speakers extraction. In IEEE Interna- tional Conference on Acoustics, Speech, and Signal Processing (ICASSP), Dallas, TX, 2010.
S. Markovich-Golan, S. Gannot, and I. Cohen. Low-complexity addition or removal of sensors/constraints in LCMV beamformers. IEEE Transactions on Signal Processing, 60(3):1205-1214, 2012.
S. Markovich-Golan, S. Gannot, and I. Cohen. Distributed multiple con- straints generalized sidelobe canceler for fully connected wireless acoustic sensor networks. IEEE Transactions on Audio, Speech, and Language Processing, 21(2):343-356, 2013.
D. Marquardt, E. Hadad, S. Gannot, and S. Doclo. Theoretical analysis of linearly constrained multi-channel Wiener filtering algorithms for com- bined noise reduction and binaural cue preservation in binaural hearing aids. IEEE/ACM Transactions on Audio, Speech, and Language Process- ing, 23(12):2384-2397, 2015.
N. Mitianoudis and M.E. Davies. Audio source separation of convolutive mixtures. IEEE Transactions on Speech and Audio Processing, 11(5):489- 497, 2003.
R. Mukai, H. Sawada, S. Araki, and S. Makino. Robust real-time blind source separation for moving speakers in a room. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2003.
K. Nakadai, H. Nakajima, Y. Hasegawa, and H. Tsujino. Sound source separation of moving speakers for robot audition. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, 2009.
K. Nakadai, T. Takahashi, H. Okuno, H. Nakajima, Y. Hasegawa, and H. Tsujino. Design and implementation of robot audition system 'HARK'- Open source software for listening to three simultaneous speakers. Ad- vanced Robotics, 24(5-6):739-761, 2010.
A. Narayanan and D. Wang. Ideal ratio mask estimation using deep neu- ral networks for robust speech recognition. In IEEE International Confer- ence on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, 2013.
F. Nesta, P. Svaizer, and M. Omologo. Convolutive BSS of short mixtures by ICA recursively regularized across frequencies. IEEE Transactions on Audio, Speech, and Language Processing, 19(3):624-639, 2011.
A. Nugraha, A. Liutkus, and E. Vincent. Multichannel audio source sep- aration with deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9):1652-1664, 2016.
M. O'Connor and W. B. Kleijn. Diffusion-based distributed MVDR beam- former. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 2014.
P. O'Grady, B. A. Pearlmutter, and S. Rickard. Survey of sparse and non- sparse methods in source separation. Int. Journal of Imaging Systems and Technology, 15(1):18-33, 2005.
A. Ozerov and C. Févotte. Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 18(3):550-563, 2010.
A. Ozerov, C. Févotte, and M. Charbit. Factorial scaled hidden markov model for polyphonic audio representation and source separation. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2009.
A. Ozerov, E. Vincent, and F. Bimbot. A general flexible framework for the handling of prior information in audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 20(4):1118- 1133, 2012.
L. Parra and C. Spence. Convolutive blind separation of non-stationary sources. IEEE Transactions on Speech and Audio Processing, 8(3):320- 327, 2000.
L.C. Parra and C.V. Alvino. Geometric source separation: Merging convo- lutive source separation with geometric beamforming. IEEE Transactions on Speech and Audio Processing, 10(6):352-362, 2002.
A.T. Parsons. Maximum directivity proof for three-dimensional arrays. Journal of the Acoustical Society of America, 82(1):179-182, 1987.
M. S. Pedersen, J. Larsen, U. Kjems, and L. C. Parra. Convolutive blind source separation methods. In Springer Handbook of Speech Processing, pages 1065-1094. Springer, 2008.
P. Pertilä, M. S. Hämäläinen, and M. Mieskolainen. Passive temporal offset estimation of multichannel recordings of an ad-hoc microphone ar- ray. IEEE Transactions on Audio, Speech, and Language Processing, 21(11):2393-2402, 2013.
M. Plumbley, T. Blumensath, L. Daudet, R. Gribonval, and M.E. Davies. Sparse representations in audio and music: From coding to source sepa- ration. Proceedings of the IEEE, 98(6):995-1005, 2010.
R. E. Prieto and P. Jinachitra. Blind source separation for time-variant mixing systems using piecewise linear approximations. In IEEE Interna- tional Conference on Acoustics, Speech, and Signal Processing (ICASSP), Philadelphia, PN, 2005.
N. Roman and D. Wang. Binaural tracking of multiple moving sources. IEEE Transactions on Audio, Speech, and Language Processing, 16(4):728-739, 2008.
N. Roman, D. Wang, and G.J. Brown. Speech segregation based on sound localization. Journal of the Acoustical Society of America, 114(4):2236- 2252, 2003.
H. Sawada, S. Araki, R. Mukai, and S. Makino. Grouping separated frequency components by estimating propagation model parameters in frequency-domain blind source separation. IEEE Transactions on Audio, Speech, and Language Processing, 15(5):1592-1604, 2007.
H. Sawada, R. Mukai, S. Araki, and S. Makino. A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Transactions on Speech and Audio Processing, 12(5):530-538, 2004.
D. Schmid, G. Enzner, S. Malik, D. Kolossa, and R. Martin. Variational bayesian inference for multichannel dereverberation and noise reduction. IEEE/ACM Transactions on Audio, Speech and Language Processing, 22(8):1320-1335, 2014.
R. Schmidt. Multiple emitter location and signal parameter estimation. IEEE Transactions on Antennas and Propagation, 34(3):276-280, 1986.
O. Schwartz and S. Gannot. Speaker tracking using recursive EM algo- rithms. IEEE/ACM Transactions on Audio, Speech, and Language Pro- cessing, 22(2):392-402, 2014.
O. Schwartz, S. Gannot, and E. Habets. Multi-microphone speech dere- verberation and noise reduction using relative early transfer functions. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(2):240-251, 2015.
O. Schwartz, S. Gannot, and E.P. Habets. Nested generalized sidelobe canceller for joint dereverberation and noise reduction. In IEEE Inter- national Conference on Audio and Acoustic Signal Processing (ICASSP), Brisbane, Australia, 2015.
L. Simon and E. Vincent. A general framework for online audio source sep- aration. In Int. Conf. on Latent Variable Analysis and Signal Separation (LVA/ICA), Tel-Aviv, Israel, 2012.
P. Smaragdis. Blind separation of convolved mixtures in the frequency domain. Neurocomputing, 22(1):21-34, 1998.
N. Sturmel, A. Liutkus, J. Pinel, L. Girin, S. Marchand, G. Richard, R. Badeau, and L. Daudet. Linear mixing models for active listening of music productions in realistic studio conditions. In Proc. Convention of the Audio Engineering Society (AES), Budapest, Hungary, 2012.
R. Talmon, I. Cohen, and S. Gannot. Convolutive transfer function gener- alized sidelobe canceler. IEEE Transactions on Audio, Speech, and Lan- guage Processing, 17(7):1420-1434, 2009.
R. Talmon, I. Cohen, and S. Gannot. Relative transfer function identifi- cation using convolutive transfer function approximation. IEEE Transac- tions on Audio, Speech, and Language Processing, 17(4):546-555, 2009.
O. Thiergart, M. Taseska, and E. Habets. An informed LCMV filter based on multiple instantaneous direction-of-arrival estimates. In IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, 2013.
O. Thiergart, M. Taseska, and E. Habets. An informed parametric spatial filter based on instantaneous direction-of-arrival estimates. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(12):2182- 2196, 2014.
J.-M. Valin, F. Michaud, and J. Rouat. Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering. Robotics and Autonomous Systems, 55(3):216-228, 2007.
H. L. Van Trees. Detection, Estimation, and Modulation Theory, volume IV, Optimum Array Processing. Wiley, New York, 2002.
D. Vijayasenan, F. Valente, and H. Bourlard. Multistream speaker diariza- tion of meetings recordings beyond MFCC and TDOA features. Springer handbook on speech processing and speech communication, 54(1), 2012.
E. Vincent, N. Bertin, R. Gribonval, and F. Bimbot. From blind to guided audio source separation: How models and side information can improve the separation of sound. IEEE Signal Processing Magazine, 31(3):107-115, 2014.
E. Vincent, M.G. Jafari, S.A. Abdallah, M.D. Plumbley, and M.E. Davies. Probabilistic modeling paradigms for audio source separation. Machine Audition: Principles, Algorithms and Systems, pages 162-185, 2010.
D. Wang. Deep learning reinvents the hearing aid. IEEE Spectrum, 54(3):32-37, 2017.
D. Wang and J. Chen. Supervised speech separation based on deep learn- ing: an overview. arXiv preprint arXiv:1708.07524, 2017.
L. Wang and S. Doclo. Correlation maximization-based sampling rate offset estimation for distributed microphone arrays. IEEE/ACM Trans- actions on Audio, Speech, and Language Processing, 24(3):571-582, 2016.
L. Wang, T.-K. Hon, J.D. Reiss, and A. Cavallaro. An iterative ap- proach to source counting and localization using two distant microphones. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(6):1079-1093, 2016.
E. Warsitz and R. Haeb-Umbach. Blind acoustic beamforming based on generalized eigenvalue decomposition. IEEE Transactions on Audio, Speech, and Language Processing, 15(5):1529-1539, 2007.
S. Wehr, I. Kozintsev, R. Lienhart, and W. Kellermann. Synchronization of acoustic sensors for distributed ad-hoc audio networks and its use for blind source separation. In IEEE Int. Symposium on Multimedia Software Engineering, Miami, FL, 2004.
E. Weinstein, A.V. Oppenheim, M. Feder, and J.R. Buck. Iterative and sequential algorithms for multisensor signal enhancement. IEEE Trans- actions on Signal Processing, 42(4):846-859, 1994.
B. Widrow, J.R. Glover, J.M. McCool, J. Kaunitz, C.S. Williams, R.H. Hearn, J.R. Zeidler, J.R.E. Dong, and R.C. Goodlin. Adaptive noise can- celling: Principles and applications. Proceedings of the IEEE, 63(12):1692- 1716, 1975.
S. Winter, W. Kellermann, H. Sawada, and S. Makino. MAP-based under- determined blind source separation of convolutive mixtures by hierarchical clustering and l1-norm minimization. EURASIP Journal on Applied Sig- nal Processing, 2007(1):81-81, 2007.
O. Yilmaz and S. Rickard. Blind separation of speech mixtures via time- frequency masking. IEEE Transactions on Signal Processing, 52(7):1830- 1847, 2004.
T. Yoshioka, T. Nakatani, M. Miyoshi, and H. Okuno. Blind separa- tion and dereverberation of speech mixtures by joint optimization. IEEE Transactions on Audio, Speech, and Language Processing, 19(1):69-84, 2011.
Y. Zeng and R.C. Hendriks. Distributed delay and sum beamformer for speech enhancement via randomized gossip. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(1):260-273, 2014.
X. Zhang and D. Wang. Deep learning based binaural speech separation in reverberant environments. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(5):1075-1084, 2017.

Audio source separation into the wild

Sign up for access to the world's latest research

Abstract

Related papers

References (150)

Related papers

Related topics