Sound Source Separation

description44 papers

group68 followers

lightbulbAbout this topic

Sound Source Separation is a computational technique in audio signal processing that aims to isolate individual sound sources from a mixed audio signal. This process involves analyzing the characteristics of the sound waves to distinguish and extract distinct audio components, facilitating applications in music, speech recognition, and environmental sound analysis.

lightbulbAbout this topic

Key research themes

1. How can multichannel recordings and spatial modeling improve audio source separation in real-world environments?

This research theme focuses on leveraging multichannel audio data, spatial filtering, and modeling of acoustic environments to improve the separation of overlapping audio sources in natural, reverberant, and complex settings. It is motivated by practical applications such as hearing aids, smart assistants, and telecommunication, where recordings occur outside controlled laboratory conditions. Challenges addressed include moving sources, varying numbers of sources and sensors, reverberation, synchronization, and spatial diffusion of sound sources.

Audio source separation into the wild

by Sharon Gannot

2025, Elsevier eBooks

Key finding: This work provides an extensive overview and analysis of multichannel audio source separation (MASS) techniques applied in real-world, uncontrolled environments rather than idealized laboratory conditions. It highlights that... Read more

articleView Paper downloadDownload

Multichannel Audio Source Separation With Probabilistic Reverberation Priors

by Roland Badeau

2025, IEEE/ACM transactions on audio, speech, and language processing

Key finding: This paper introduces probabilistic priors on the reverberation characteristics of mixing filters within a multichannel audio source separation framework, modeling early reverberation as autoregressive and late reverberation... Read more

articleView Paper downloadDownload

PROJET — Spatial audio separation using projections

by Roland Badeau

2025

Key finding: This study proposes a novel multichannel audio source separation approach that processes projections of multichannel signals onto various spatial directions instead of directly handling inter-channel covariance matrices. By... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What biologically-inspired and deep learning methodologies can enhance sound source segregation and separation in complex acoustic scenes?

This line of research investigates algorithms that mimic human auditory processing and leverage advanced neural network architectures such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and deep ensemble models to perform sound segregation, particularly in challenging scenarios like the Cocktail Party Problem. This theme emphasizes physiological plausibility, feature complementarity, unsupervised/self-supervised learning, and neural architectures tailored for improved separation and robustness to realistic audio mixtures including speech and music sources.

A biologically oriented algorithm for spatial sound segregation

by Kenny Chou

2022

Key finding: This paper presents a binaural sound segregation algorithm based on a hierarchical neural network model inspired by the barn owl auditory system. The algorithm generates neural spike representations tuned to spatial locations... Read more

articleView Paper downloadDownload

Ensemble System of Deep Neural Networks for Single-Channel Audio Separation

by Musab Tahseen Salahaldeen Al-Kaltakchi

2023, Information

Key finding: This research introduces an ensemble deep neural network architecture that simultaneously exploits complementary acoustic features extracted from raw single-channel audio to estimate ideal binary masks for source separation.... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can sound source separation be integrated with sound event detection to improve recognition in noisy and polyphonic environments?

This theme explores the synergy between source separation and sound event detection (SED), particularly for domestic and real-world applications where overlapping events and noise interfere with detection accuracy. It includes joint training frameworks, pre-processing separation to de-mix sounds before event classification, and analytical evaluation of event detection improvements facilitated by separated sources. The approaches contribute to semi-supervised learning, leveraging unlabeled data, and improving interpretability and robustness of SED systems by integrating source separation.

Analysis and interpretation of joint source separation and sound event detection in domestic environments

by Doroteo Toledano

2025, Plos ONE

Key finding: This work presents Joint Source Separation and Sound Event Detection (JSS), a joint training scheme that improves polyphonic SED performance by leveraging source separation to disentangle overlapping sound events in domestic... Read more

articleView Paper downloadDownload

All papers in Sound Source Separation

Development of a Robotic Pet Using Sound Source Localization with the HARK Robot Audition System

by takuto takahashi

2025, Journal of Robotics and Mechatronics

We have developed a self-propelling robotic pet, in which the robot audition software HARK (Honda Research Institute Japan Audition for Robots with Kyoto University) was installed to equip it with sound source localization functions, thus... more

descriptionView Paper arrow_downwardDownload

Phase reconstruction of spectrograms based on a model of repeated audio events

by Roland Badeau

2025

Phase recovery of modified spectrograms is a major issue in audio signal processing applications, such as source separation. This paper introduces a novel technique for estimating the phases of components in complex mixtures within onset... more

descriptionView Paper arrow_downwardDownload

PROJET — Spatial audio separation using projections

by Roland Badeau

2025

descriptionView Paper arrow_downwardDownload

Common fate model for unison source separation

by Roland Badeau

2025

In this paper we present a novel source separation method aiming to overcome the difficulty of modelling non-stationary signals. The method can be applied to mixtures of musical instruments with frequency and/or amplitude modulation, e.g.... more

descriptionView Paper arrow_downwardDownload

Lévy NMF for robust nonnegative source separation

by Roland Badeau

2025, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Source separation, which consists in decomposing data into meaningful structured components, is an active research topic in many fields including music signal processing. In this paper, we introduce the Positive α-stable (PαS)... more

descriptionView Paper arrow_downwardDownload

Phase reconstruction of spectrograms with linear unwrapping: Application to audio signal restoration

by Roland Badeau

2025, 2015 23rd European Signal Processing Conference (EUSIPCO)

This paper introduces a novel technique for reconstructing the phase of modified spectrograms of audio signals. From the analysis of mixtures of sinusoids we obtain relationships between phases of successive time frames in the... more

descriptionView Paper arrow_downwardDownload

Multimodal Evaluation Method for Sound Event Detection

by Aomar Osmani

2025, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Time is an important dimension in sound event detection (SED) systems. However, evaluating the performance of SED systems is directly taken from the classical machine learning domain, and they are not well adapted to the needs of these... more

descriptionView Paper arrow_downwardDownload

Employing Chroma-gram techniques for audio source separation in human-computer interaction

by Hiba Basim Alwan Hussain Al-Dulaimi

2025

Voice data plays a significant role in the current era, and the method of speech communication is becoming increasingly prevalent daily. This involves the utilization of software for sending voice messages and controlling various settings... more

descriptionView Paper arrow_downwardDownload

Using score-informed constraints for NMF-based source separation

by Meinard Müller

2025

Techniques based on non-negative matrix factorization (NMF) can be used to efficiently decompose a magnitude spectrogram into a set of template (column) vectors and activation (row) vectors. To better control this decomposition, NMF has... more

descriptionView Paper arrow_downwardDownload

Analysis and interpretation of joint source separation and sound event detection in domestic environments

by Doroteo Toledano

2025, Plos ONE

In recent years, the relation between Sound Event Detection (SED) and Source Separation (SSep) has received a growing interest, in particular, with the aim to enhance the performance of SED by leveraging the synergies between both tasks.... more

descriptionView Paper arrow_downwardDownload

Improving Music Source Separation based on DNNs through Data Augmentation and Network Blending

by Franck Giron

2025

Separation of music into instruments ("bass", "drums", "other", "vocals") • Two network architectures are described: feed-forward and recurrent Each of them yields state-of-the art results on SiSEC DSD100 We blend both architectures to... more

descriptionView Paper arrow_downwardDownload

Towards Cross-Version Singing Voice Detection

by Meinard Müller

2024

In the field of Music Information Retrieval (MIR), the automated detection of the singing voice within a given music recording constitutes a challenging and important research problem. The goal of this task is to find those segments... more

descriptionView Paper arrow_downwardDownload

Retrieving Signals with Deep Complex Extractors

by Negar Rostamzadeh

2024

Recent advances have made it possible to create deep complex-valued neural networks. Despite this progress, many challenging learning tasks have yet to leverage the power of complex representations. Building on recent advances, we propose... more

descriptionView Paper arrow_downwardDownload

Singing voice separation with deep u-net convolutional networks

by Tillman Weyde

2024, International Symposium/Conference on Music Information Retrieval

The decomposition of a music audio signal into its vocal and backing track components is analogous to image-toimage translation, where a mixed spectrogram is transformed into its constituent sources. We propose a novel application of the... more

descriptionView Paper arrow_downwardDownload

Improving DNN-based Music Source Separation using Phase Features

by Joachim Muth

2024, arXiv (Cornell University)

Music source separation with deep neural networks typically relies only on amplitude features. In this paper we show that additional phase features can improve the separation performance. Using the theoretical relationship between STFT... more

descriptionView Paper arrow_downwardDownload

Improving DNN-based Music Source Separation using Phase Features

by Joachim Muth

2024, ArXiv

descriptionView Paper arrow_downwardDownload

Using a Neural Network Codec Approximation Loss to Improve Source Separation Performance in Limited Capacity Networks

by Joseph Paradiso

2024

A growing need for on-device machine learning has led to an increased interest in lightweight neural networks that lower model complexity while retaining performance. While a variety of general-purpose techniques exist in this context,... more

descriptionView Paper arrow_downwardDownload

Towards a Perceptual Loss

by Joseph Paradiso

2024

Generative audio models based on neural networks have led to considerable improvements across fields including speech enhancement, source separation, and text-to-speech synthesis. These systems are typically trained in a supervised... more

descriptionView Paper arrow_downwardDownload

Does k Matter? k-NN Hubness Analysis for Kernel Additive Modelling Vocal Separation

by Mark Sandler

2024, Latent Variable Analysis and Signal Separation

Kernel Additive Modelling (KAM) is a framework for source separation aiming to explicitly model inherent properties of sound sources to help with their identification and separation. KAM separates a given source by applying robust... more

descriptionView Paper arrow_downwardDownload

Vocal Harmony Separation Using Time-Domain Neural Networks

by Mark Sandler

2024, Interspeech 2021

Polyphonic vocal recordings are an inherently challenging source separation task due to the melodic structure of the vocal parts and unique timbre of its constituents. In this work we utilise a time-domain neural network architecture... more

descriptionView Paper arrow_downwardDownload

A DenseU-Net for Separation of vocals from Polyphonic Music Signal Mixture

by Grenze International Journal of Engineering and Technology GIJET

2024

Deep neural network algorithms have recently emerged as a promising technique for music source separation. In existing methods that rely on deep learning algorithm, billions of parameters are to be trained. In this paper, we propose a... more

spectrogram. Soft TF masking is applied to improve the quality of the predicted source. Once the TF mask is computed, it is integrated with the magnitude spectrogram of the mixture to estimate the source spectra. The phase spectrogram of the mixture is merged with the estimated magnitude spectrogram to restore the estimated source waveform using inverse STFT (ISTFT).

TABLE I. MODEL SUMMARY OF ENCODER USING DENSE BLOCK WITH GROWTH RATE ‘K’

Fig. 5. Details of a Dense Block (layer 5) accelerated by employing batch normalization. After concatenation, the convolution layer in the decoder retains the spectrogram size by using the kernel size | x 1 and decreases the number of channels. The features were fed to the dense block to extract essential features. This process was repeated for each hierarchical layer. In the final layer, a 1x1 convolution is used to map the features to restore the original spectrogram size. The details of the number of filters along with the corresponding number of trainable parameters used in the decoder are listed in Table IL.

TABLE II. MODEL SUMMARY OF DECODER USING DENSE BLOCK WITH GROWTH RATE ‘Kk’ The direct skip connections used in DenseU-Net between layers at the same hierarchical level allow information to flow directly from the encoder to the decoder layers. Thus, the vocal features are extracted and the DNN model predicts the magnitude spectrogram X,, of the drum signal. The DNN model is similarly trained with clean vocal spectra replaced by mixture spectra and the DNN is tuned to predict X of the mixture spectra.

* denotes the waveform-based model TABLE II. PERFORMANCE COMPARISON OF AVERAGE SDR (VOCALS) ON MUSDB DATASET

descriptionView Paper arrow_downwardDownload

A Review: Speech Separation Techniques and Its Applications

by rama prasad

2024, International Journal of Advances in Scientific Research and Engineering (ijasre)

A fundamental task in signal processing, speech separation has many practical applications. For example, it can be used to improve the accuracy of automatic speech recognition by separating clear speech from noisy speech signals. When all... more

Table 1 comparison table of different systems. 4. CONCLUSION

descriptionView Paper arrow_downwardDownload

Monaural Speech Separation with Deep Learning Using Phase Modelling and Capsule Networks

by Tillman Weyde

2024

The removal of background noise from speech audio is a problem with high practical relevance. A variety of deep learning approaches have been applied to it in recent years, most of which operate on a magnitude spectrogram representation... more

descriptionView Paper arrow_downwardDownload

Close Miking Empirical Practice Verification: A Source Separation Approach

by Gerald Schuller

2024, arXiv (Cornell University)

Close miking represents a widely employed practice of placing a microphone very near to the sound source in order to capture more direct sound and minimize any pickup of ambient sound, including other, concurrently active sources. It is... more

descriptionView Paper arrow_downwardDownload

MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation

by Gerald Schuller

2024, 2018 International Joint Conference on Neural Networks (IJCNN)

Monaural singing voice separation task focuses on the prediction of the singing voice from a single channel music mixture signal. Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning... more

descriptionView Paper arrow_downwardDownload

Revisiting Representation Learning for Singing Voice Separation with Sinkhorn Distances

by Gerald Schuller

2024, ArXiv

In this work we present a method for unsupervised learning of audio representations, focused on the task of singing voice separation. We build upon a previously proposed method for learning representations of time-domain music signals... more

descriptionView Paper arrow_downwardDownload

Unsupervised Interpretable Representation Learning for Singing Voice Separation

by Gerald Schuller

2024, 2020 28th European Signal Processing Conference (EUSIPCO)

In this work, we present a method for learning interpretable music signal representations directly from waveform signals. Our method can be trained using unsupervised objectives and relies on the denoising auto-encoder model that uses a... more

descriptionView Paper arrow_downwardDownload

A recurrent encoder-decoder approach with skip-filtering connections for monaural singing voice separation

by Gerald Schuller

2024, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)

The objective of deep learning methods based on encoder-decoder architectures for music source separation is to approximate either ideal time-frequency masks or spectral representations of the target music source(s). The spectral... more

descriptionView Paper arrow_downwardDownload

Examining the Mapping Functions of Denoising Autoencoders in Singing Voice Separation

by Gerald Schuller

2024, IEEE/ACM Transactions on Audio, Speech, and Language Processing

The goal of this work is to investigate what singing voice separation approaches based on neural networks learn from the data. We examine the mapping functions of neural networks based on the denoising autoencoder (DAE) model that are... more

descriptionView Paper arrow_downwardDownload

Time Scale Modification of Audio Using Non-Negative Matrix Factorization

by Elizabeth Dobson

2024

This paper introduces an algorithm for time-scale modification of audio signals based on using non-negative matrix factorization. The activation signals attributed to the detected components are used for identifying sound events. The... more

descriptionView Paper arrow_downwardDownload

Timbre transfer using image-to-image denoising diffusion implicit models

by Luca Comanducci

2024, arXiv (Cornell University)

Timbre transfer techniques aim at converting the sound of a musical piece generated by one instrument into the same one as if it was played by another instrument, while maintaining as much as possible the content in terms of musical... more

descriptionView Paper arrow_downwardDownload

Hybrid Y-Net Architecture for Singing Voice Separation

by rashen fernando

2023, arXiv (Cornell University)

This research paper presents a novel deep learningbased neural network architecture, named Y-Net, for achieving music source separation. The proposed architecture performs end-to-end hybrid source separation by extracting features from... more

descriptionView Paper arrow_downwardDownload

Embedded and Integrated Audition for a Mobile Robot

by François Michaud

2023, National Conference on Artificial Intelligence

"Aurally Informed Performance' for mobile robots operating in natural environments brings difficult challenges, such as: localizing sound sources all around the robot; tracking these sources as they or the robot move; separate the sources... more

descriptionView Paper arrow_downwardDownload

ODAS: Open embeddeD Audition System

by François Michaud

2023, Frontiers in Robotics and AI

Artificial audition aims at providing hearing capabilities to machines, computers and robots. Existing frameworks in robot audition offer interesting sound source localization, tracking and separation performance, although involve a... more

descriptionView Paper arrow_downwardDownload

AERO: Audio Super Resolution in the Spectral Domain

by Or Tal

2023, arXiv (Cornell University)

We present AERO, a audio super-resolution model that processes speech and music signals in the spectral domain. AERO is based on an encoder-decoder architecture with U-Net like skip connections. We optimize the model using both time and... more

descriptionView Paper arrow_downwardDownload

Spectrogram Feature Losses for Music Source Separation

by Romann Weber

2023

In this paper we study deep learning-based music source separation, and explore using an alternative loss to the standard spectrogram pixel-level L2 loss for model training. Our main contribution is in demonstrating that adding a... more

descriptionView Paper arrow_downwardDownload

Two-Microphone Binary Mask Speech Enhancement; Application to Diffuse and Directional Noise Fields

by mohsen rahmani

2023, ETRI Journal

descriptionView Paper arrow_downwardDownload

Parallel Stacked Hourglass Network for Music Source Separation

by Yagya Raj Pandeya

2023, IEEE Access

Music source separation is one of the old and challenging problems in music information retrieval society. Improvements in deep learning lead to big progress in decomposing music into its constitutive components with a variety of music.... more

descriptionView Paper arrow_downwardDownload

Parallel Stacked Hourglass Network for Music Source Separation

by Yagya Raj Pandeya

2023, IEEE Access

descriptionView Paper arrow_downwardDownload

Towards Modeling And Decomposing Loop-Based Electronic Music

by Jonathan Driedger

2023

Electronic Music (EM) is a popular family of genres which has increasingly received attention as a research subject in the field of MIR. A fundamental structural unit in EM are loops-audio fragments whose length can span several seconds.... more

descriptionView Paper arrow_downwardDownload

An experimental approach to generalized Wiener filtering in music source separation

by Meinard Müller

2023, 2016 24th European Signal Processing Conference (EUSIPCO)

Music source separation aims at decomposing music recordings into their constituent component signals. Many existing techniques are based on separating a time-frequency representation of the mixture signal by applying suitable modeling... more

descriptionView Paper arrow_downwardDownload

Current Issues of the Use of Artificial Intelligence in the Activities of Customs Authorities

by Jasur Sevinov

2023, arXiv (Cornell University)

Your feedback is welcomed! We did our best to be as precise, informative and up to the point as possible, but should there be anything you feel might be an error or could be rephrased to be more precise or comprehensible, please don't... more

Figure 1.1: Computing the output values of a discrete convolution.

Figure 1.2: Computing the output values of a discrete convolution for N = 2, 14 19 5, ky ko 3, S1 S92 2. and Pi = p2 = 1.

Figure 1.3: A convolution mapping from two input feature maps to three output feature maps using a 3 x 2 x 3 x 3 collection of kernels w. In the left pathway, input feature map 1 is convolved with kernel w,,; and input feature map 2 is convolved with kernel wj,2, and the results are summed together elementwise to form the first output feature map. The same is repeated for the middle and right pathways to form the second and third feature maps, and all three output feature maps are grouped together to form the output.

Figure 1.4: An alternative way of viewing strides. Instead of translating the 3 x 3 kernel by increments of s = 2 (left), the kernel is translated by increments of 1 and only one in s = 2 output elements is retained (right).

Figure 1.5: Computing the output values of a 3 x 3 average pooling operation ona 5 xX 5 input using 1 x 1 strides.

Figure 1.6: Computing the output values of a 3 x 3 max pooling operation on a5 x 5 input using 1 x 1 strides.

Figure 2.1: (No padding, unit strides) Convolving a 3 x 3 kernel over a 4 x 4 input using unit strides (ie., 2 = 4, k = 3, s=1 and p=0).

Figure 2.2: (Arbitrary padding, unit strides) Convolving a 4 x 4 kernel over a 5 x 5 input padded with a 2 x 2 border of zeros using unit strides (i.e., 7 = 5, k=4,s=1 and p=2).

Figure 2.3: (Half padding, unit strides) Convolving a 3 x 3 kernel over a 5 x 5 input using half padding and unit strides (i.e., i= 5, k = 3,5 =1 and p=1).

Figure 2.4: (Full padding, unit strides) Convolving a 3 x 3 kernel over a 5 x 5 input using full padding and unit strides (i.e., i = 5, k = 3, s = 1 and p = 2).

Figure 2.5: (No zero padding, arbitrary strides) Convolving a 3 x 3 kernel over a5 x 5 input using 2 x 2 strides (i.e.,i=5,k =3,s=2 and p=0).

Figure 2.6: (Arbitrary padding and strides) Convolving a 3 x 3 kernel over a 5 x 5 input padded with a 1 x 1 border of zeros using 2 x 2 strides (i.e., i = 5, k =3,s=2 and p=1).

Figure 2.7: (Arbitrary padding and strides) Convolving a 3 x 3 kernel over a 6 x 6 input padded with a 1 x 1 border of zeros using 2 x 2 strides (i.e., i = 6, k = 3, s =2 and p=1). In this case, the bottom row and right column of the zero padded input are not covered by the kernel.

Figure 4.1: The transpose of convolving a 3 x 3 kernel over a 4 x 4 input using unit strides (i.e., i = 4, k = 3, s = 1 and p= 0). It is equivalent to convolving a3 x3 kernel over a 2 x 2 input padded with a 2 x 2 border of zeros using unit strides (i.e., 7’ = 2, k’ =k, s’ =1 and p’ = 2).

Figure 4.2: The transpose of convolving a 4 x 4 kernel over a 5 x 5 input padded with a 2 x 2 border of zeros using unit strides (i.e., i = 5, k = 4, s = 1 and p = 2). It is equivalent to convolving a 4 x 4 kernel over a 6 x 6 input padded with a 1 x 1 border of zeros using unit strides (ie., i’ = 6, k’ = k, s’ = 1 and p' =1).

Figure 4.3: The transpose of convolving a 3 x 3 kernel over a 5 x 5 input using half padding and unit strides (i.e., 1 = 5, k = 3, s 1 and p = 1). It is equivalent to convolving a 3 x 3 kernel over a 5 x 5 input using half padding and unit strides (ie., 7’ = 5, k’ =k, s’ = 1 and p’ = 1).

Figure 4.4: The transpose of convolving a 3 x 3 kernel over a 5 x 5 input using full padding and unit strides (i.e., 1 = 5, k = 3, s = 1 and p = 2). It is equivalent to convolving a 3 x 3 kernel over a 7 x 7 input using unit strides (i-e., 2’ = 7, k’ =k, s’ =1 and p’ =0).

Figure 4.5: The transpose of convolving a 3 x 3 kernel over a 5 x 5 input using 2 x 2 strides (i.e., i= 5, k = 3, s = 2 and p= 0). It is equivalent to convolving a 3 x 3 kernel over a 2 x 2 input (with 1 zero inserted between inputs) padded with a 2 x 2 border of zeros using unit strides (ie., i’ = 2, i = 3, k’! =k, s’ =1 and p’ = 2).

Figure 4.6: The transpose of convolving a 3 x 3 kernel over a 5 x 5 input padded with a 1 x 1 border of zeros using 2 x 2 strides (i.e., i = 5, k = 3, s = 2 and p = 1). It is equivalent to convolving a 3 x 3 kernel over a 3 x 3 input (with 1 zero inserted between inputs) padded with a 1 x 1 border of zeros using unit strides (i.e., i’ = 3, 7’ = 5, k’ =k, s’ =1 and p' = 1).

Figure 5.1: (Dilated convolution) Convolving a 3 x 3 kernel over a 7 x 7 input with a dilation factor of 2 (ie.,i=7,k =3,d=2,5=1 and p=0).

descriptionView Paper arrow_downwardDownload

Attention-based Image Upsampling

by Hesham Mostafa

2023, ArXiv

Convolutional layers are an integral part of many deep neural network solutions in computer vision. Recent work shows that replacing the standard convolution operation with mechanisms based on self-attention leads to improved performance... more

descriptionView Paper arrow_downwardDownload

HTMD-Net: A Hybrid Masking-Denoising Approach to Time-Domain Monaural Singing Voice Separation

by Petros Maragos

2023, 2021 29th European Signal Processing Conference (EUSIPCO)

The advent of deep learning has led to the prevalence of deep neural network architectures for monaural music source separation, with end-to-end approaches that operate directly on the waveform level increasingly receiving research... more

descriptionView Paper arrow_downwardDownload

Coordination mechanism for integrated design of Human-Robot Interaction scenarios

by François Michaud

2023, Paladyn, Journal of Behavioral Robotics

The ultimate long-term goal in Human-Robot Interaction (HRI) is to design robots that can act as a natural extension to humans. This requires the design of robot control architectures to provide structure for the integration of the... more

descriptionView Paper arrow_downwardDownload

Integration of sound source localization and separation to improve Dialogue Management on a robot

by François Michaud

2023, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems

To demonstrate the influence of an artificial audition system on speech recognition and dialogue management for a robot, this paper presents a case study involving soft coupling of ManyEars, a sound source localization, tracking and... more

descriptionView Paper arrow_downwardDownload

Towards an enactive robot audition architecture

by Valentin Lunati

2023

Robots are usually equipped with advanced capabilities in order to autonomously adapt to real and dynamic environments and to interact with humans. Robot Perception is being inspired by new embodied cognition approaches that redefine the... more

descriptionView Paper arrow_downwardDownload

Estudo da NMF auxiliada por partitura aplicada a separação de sons de instrumentos musicais

by Magno Silva

2023, Anais de XXXV Simpósio Brasileiro de Telecomunicações e Processamento de Sinais

Resumo-Recentemente, a fatoração de matrizes não negativas auxiliada por partitura (SI-NMFscore-informed non-negative matrix factorization) tem sido utilizada para separação de sons de instrumentos musicais. Considerando essa aplicação,... more

descriptionView Paper arrow_downwardDownload

Timescalenet: a multiresolution approch for raw audio recognition

by Aro Ramamonjy

2023, HAL (Le Centre pour la Communication Scientifique Directe)

In recent years, the use of Deep Learning techniques in audio signal processing has led the scientific community to develop machine learning strategies that allow to build efficient representations from raw waveforms for machine hearing... more