Academia.eduAcademia.edu

Sound Source Separation

description44 papers
group68 followers
lightbulbAbout this topic
Sound Source Separation is a computational technique in audio signal processing that aims to isolate individual sound sources from a mixed audio signal. This process involves analyzing the characteristics of the sound waves to distinguish and extract distinct audio components, facilitating applications in music, speech recognition, and environmental sound analysis.
lightbulbAbout this topic
Sound Source Separation is a computational technique in audio signal processing that aims to isolate individual sound sources from a mixed audio signal. This process involves analyzing the characteristics of the sound waves to distinguish and extract distinct audio components, facilitating applications in music, speech recognition, and environmental sound analysis.

Key research themes

1. How can multichannel recordings and spatial modeling improve audio source separation in real-world environments?

This research theme focuses on leveraging multichannel audio data, spatial filtering, and modeling of acoustic environments to improve the separation of overlapping audio sources in natural, reverberant, and complex settings. It is motivated by practical applications such as hearing aids, smart assistants, and telecommunication, where recordings occur outside controlled laboratory conditions. Challenges addressed include moving sources, varying numbers of sources and sensors, reverberation, synchronization, and spatial diffusion of sound sources.

Key finding: This work provides an extensive overview and analysis of multichannel audio source separation (MASS) techniques applied in real-world, uncontrolled environments rather than idealized laboratory conditions. It highlights that... Read more
Key finding: This paper introduces probabilistic priors on the reverberation characteristics of mixing filters within a multichannel audio source separation framework, modeling early reverberation as autoregressive and late reverberation... Read more
Key finding: This study proposes a novel multichannel audio source separation approach that processes projections of multichannel signals onto various spatial directions instead of directly handling inter-channel covariance matrices. By... Read more

2. What biologically-inspired and deep learning methodologies can enhance sound source segregation and separation in complex acoustic scenes?

This line of research investigates algorithms that mimic human auditory processing and leverage advanced neural network architectures such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and deep ensemble models to perform sound segregation, particularly in challenging scenarios like the Cocktail Party Problem. This theme emphasizes physiological plausibility, feature complementarity, unsupervised/self-supervised learning, and neural architectures tailored for improved separation and robustness to realistic audio mixtures including speech and music sources.

Key finding: This paper presents a binaural sound segregation algorithm based on a hierarchical neural network model inspired by the barn owl auditory system. The algorithm generates neural spike representations tuned to spatial locations... Read more
Key finding: This research introduces an ensemble deep neural network architecture that simultaneously exploits complementary acoustic features extracted from raw single-channel audio to estimate ideal binary masks for source separation.... Read more

3. How can sound source separation be integrated with sound event detection to improve recognition in noisy and polyphonic environments?

This theme explores the synergy between source separation and sound event detection (SED), particularly for domestic and real-world applications where overlapping events and noise interfere with detection accuracy. It includes joint training frameworks, pre-processing separation to de-mix sounds before event classification, and analytical evaluation of event detection improvements facilitated by separated sources. The approaches contribute to semi-supervised learning, leveraging unlabeled data, and improving interpretability and robustness of SED systems by integrating source separation.

Key finding: This work presents Joint Source Separation and Sound Event Detection (JSS), a joint training scheme that improves polyphonic SED performance by leveraging source separation to disentangle overlapping sound events in domestic... Read more

All papers in Sound Source Separation

We have developed a self-propelling robotic pet, in which the robot audition software HARK (Honda Research Institute Japan Audition for Robots with Kyoto University) was installed to equip it with sound source localization functions, thus... more
Phase recovery of modified spectrograms is a major issue in audio signal processing applications, such as source separation. This paper introduces a novel technique for estimating the phases of components in complex mixtures within onset... more
In this paper we present a novel source separation method aiming to overcome the difficulty of modelling non-stationary signals. The method can be applied to mixtures of musical instruments with frequency and/or amplitude modulation, e.g.... more
Source separation, which consists in decomposing data into meaningful structured components, is an active research topic in many fields including music signal processing. In this paper, we introduce the Positive α-stable (PαS)... more
This paper introduces a novel technique for reconstructing the phase of modified spectrograms of audio signals. From the analysis of mixtures of sinusoids we obtain relationships between phases of successive time frames in the... more
Time is an important dimension in sound event detection (SED) systems. However, evaluating the performance of SED systems is directly taken from the classical machine learning domain, and they are not well adapted to the needs of these... more
Voice data plays a significant role in the current era, and the method of speech communication is becoming increasingly prevalent daily. This involves the utilization of software for sending voice messages and controlling various settings... more
Techniques based on non-negative matrix factorization (NMF) can be used to efficiently decompose a magnitude spectrogram into a set of template (column) vectors and activation (row) vectors. To better control this decomposition, NMF has... more
In recent years, the relation between Sound Event Detection (SED) and Source Separation (SSep) has received a growing interest, in particular, with the aim to enhance the performance of SED by leveraging the synergies between both tasks.... more
Separation of music into instruments ("bass", "drums", "other", "vocals") • Two network architectures are described: feed-forward and recurrent Each of them yields state-of-the art results on SiSEC DSD100 We blend both architectures to... more
In the field of Music Information Retrieval (MIR), the automated detection of the singing voice within a given music recording constitutes a challenging and important research problem. The goal of this task is to find those segments... more
Recent advances have made it possible to create deep complex-valued neural networks. Despite this progress, many challenging learning tasks have yet to leverage the power of complex representations. Building on recent advances, we propose... more
The decomposition of a music audio signal into its vocal and backing track components is analogous to image-toimage translation, where a mixed spectrogram is transformed into its constituent sources. We propose a novel application of the... more
Music source separation with deep neural networks typically relies only on amplitude features. In this paper we show that additional phase features can improve the separation performance. Using the theoretical relationship between STFT... more
Music source separation with deep neural networks typically relies only on amplitude features. In this paper we show that additional phase features can improve the separation performance. Using the theoretical relationship between STFT... more
A growing need for on-device machine learning has led to an increased interest in lightweight neural networks that lower model complexity while retaining performance. While a variety of general-purpose techniques exist in this context,... more
Generative audio models based on neural networks have led to considerable improvements across fields including speech enhancement, source separation, and text-to-speech synthesis. These systems are typically trained in a supervised... more
Kernel Additive Modelling (KAM) is a framework for source separation aiming to explicitly model inherent properties of sound sources to help with their identification and separation. KAM separates a given source by applying robust... more
Polyphonic vocal recordings are an inherently challenging source separation task due to the melodic structure of the vocal parts and unique timbre of its constituents. In this work we utilise a time-domain neural network architecture... more
Deep neural network algorithms have recently emerged as a promising technique for music source separation. In existing methods that rely on deep learning algorithm, billions of parameters are to be trained. In this paper, we propose a... more
A fundamental task in signal processing, speech separation has many practical applications. For example, it can be used to improve the accuracy of automatic speech recognition by separating clear speech from noisy speech signals. When all... more
The removal of background noise from speech audio is a problem with high practical relevance. A variety of deep learning approaches have been applied to it in recent years, most of which operate on a magnitude spectrogram representation... more
Close miking represents a widely employed practice of placing a microphone very near to the sound source in order to capture more direct sound and minimize any pickup of ambient sound, including other, concurrently active sources. It is... more
Monaural singing voice separation task focuses on the prediction of the singing voice from a single channel music mixture signal. Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning... more
In this work we present a method for unsupervised learning of audio representations, focused on the task of singing voice separation. We build upon a previously proposed method for learning representations of time-domain music signals... more
In this work, we present a method for learning interpretable music signal representations directly from waveform signals. Our method can be trained using unsupervised objectives and relies on the denoising auto-encoder model that uses a... more
The objective of deep learning methods based on encoder-decoder architectures for music source separation is to approximate either ideal time-frequency masks or spectral representations of the target music source(s). The spectral... more
The goal of this work is to investigate what singing voice separation approaches based on neural networks learn from the data. We examine the mapping functions of neural networks based on the denoising autoencoder (DAE) model that are... more
This paper introduces an algorithm for time-scale modification of audio signals based on using non-negative matrix factorization. The activation signals attributed to the detected components are used for identifying sound events. The... more
Timbre transfer techniques aim at converting the sound of a musical piece generated by one instrument into the same one as if it was played by another instrument, while maintaining as much as possible the content in terms of musical... more
This research paper presents a novel deep learningbased neural network architecture, named Y-Net, for achieving music source separation. The proposed architecture performs end-to-end hybrid source separation by extracting features from... more
"Aurally Informed Performance' for mobile robots operating in natural environments brings difficult challenges, such as: localizing sound sources all around the robot; tracking these sources as they or the robot move; separate the sources... more
Artificial audition aims at providing hearing capabilities to machines, computers and robots. Existing frameworks in robot audition offer interesting sound source localization, tracking and separation performance, although involve a... more
by Or Tal
We present AERO, a audio super-resolution model that processes speech and music signals in the spectral domain. AERO is based on an encoder-decoder architecture with U-Net like skip connections. We optimize the model using both time and... more
In this paper we study deep learning-based music source separation, and explore using an alternative loss to the standard spectrogram pixel-level L2 loss for model training. Our main contribution is in demonstrating that adding a... more
Music source separation is one of the old and challenging problems in music information retrieval society. Improvements in deep learning lead to big progress in decomposing music into its constitutive components with a variety of music.... more
Music source separation is one of the old and challenging problems in music information retrieval society. Improvements in deep learning lead to big progress in decomposing music into its constitutive components with a variety of music.... more
Electronic Music (EM) is a popular family of genres which has increasingly received attention as a research subject in the field of MIR. A fundamental structural unit in EM are loops-audio fragments whose length can span several seconds.... more
Music source separation aims at decomposing music recordings into their constituent component signals. Many existing techniques are based on separating a time-frequency representation of the mixture signal by applying suitable modeling... more
Your feedback is welcomed! We did our best to be as precise, informative and up to the point as possible, but should there be anything you feel might be an error or could be rephrased to be more precise or comprehensible, please don't... more
Convolutional layers are an integral part of many deep neural network solutions in computer vision. Recent work shows that replacing the standard convolution operation with mechanisms based on self-attention leads to improved performance... more
The advent of deep learning has led to the prevalence of deep neural network architectures for monaural music source separation, with end-to-end approaches that operate directly on the waveform level increasingly receiving research... more
The ultimate long-term goal in Human-Robot Interaction (HRI) is to design robots that can act as a natural extension to humans. This requires the design of robot control architectures to provide structure for the integration of the... more
To demonstrate the influence of an artificial audition system on speech recognition and dialogue management for a robot, this paper presents a case study involving soft coupling of ManyEars, a sound source localization, tracking and... more
Robots are usually equipped with advanced capabilities in order to autonomously adapt to real and dynamic environments and to interact with humans. Robot Perception is being inspired by new embodied cognition approaches that redefine the... more
Resumo-Recentemente, a fatoração de matrizes não negativas auxiliada por partitura (SI-NMFscore-informed non-negative matrix factorization) tem sido utilizada para separação de sons de instrumentos musicais. Considerando essa aplicação,... more
In recent years, the use of Deep Learning techniques in audio signal processing has led the scientific community to develop machine learning strategies that allow to build efficient representations from raw waveforms for machine hearing... more
In recent years, the use of Deep Learning techniques in audio signal processing has led the scientific community to develop machine learning strategies that allow to build efficient representations from raw waveforms for machine hearing... more
Download research papers for free!