Audio Processing Research Papers

NMF with time-frequency activations to model non stationary audio events

2025

descriptionView Paper arrow_downwardDownload

Automatic calibration and equalization of a line array system

2025

This paper presents an automated Public Address processing unit, using delay and magnitude response adjustment. The aim is to achieve a flat frequency response and delay adjustment between different physically-placed speakers at the... more

descriptionView Paper arrow_downwardDownload

Automatic calibration and equalization of a line array system

by Vesa Välimäki

2025

This paper presents an automated Public Address processing unit, using delay and magnitude response adjustment. The aim is to achieve a flat frequency response and delay adjustment between different physically-placed speakers at the... more

descriptionView Paper arrow_downwardDownload

Using Association Rules Mining for Retrieving Genre-Specific Music Files

by Ismaïl Biskri

2025

Retrieving a music file from a large database is a non-trivial task. To support this task, many mechanisms have been developed over the years. However, indexing files remains one of the most popular mechanisms. Several algorithms allow... more

descriptionView Paper arrow_downwardDownload

Enhancing Conformer-Based Sound Event Detection Using Frequency Dynamic Convolutions and BEATs Audio Embeddings

by Doroteo Toledano

2025

Over the last few years, most of the tasks employing Deep Learning techniques for audio processing have achieved stateof-the-art results employing Conformer-based systems. However, when it comes to sound event detection (SED), it was... more

descriptionView Paper arrow_downwardDownload

Metodología de Control Activo de Ruido en Ductos

by Jose N Borja

2025, cybertesis.urp.edu.pe

RESUMEN En este articulo presenta el diseño, implementación y comparación de técnicas de control activo de ruido, estos controladores se basa en técnicas de control adaptativo mediante un filtro FIR y el algoritmo LMS, el principio de... more

descriptionView Paper arrow_downwardDownload

Real Time Implementation of Fpga Based Pulse Code Modulation Multiplexing

by Joseph Prathap

2025

The Pulse Code Modulation is the vital part of Analog to Digital Converter (ADC). The PCM includes the process of sampling and quantization in order to digitize the analog input along the time scale and amplitude scale respectively. This... more

descriptionView Paper arrow_downwardDownload

Movie summarization based on audiovisual saliency detection

by Petros Maragos

2024, Proceedings - International Conference on Image Processing, ICIP

Based on perceptual and computational attention modeling studies, we formulate measures of saliency for an audiovisual stream. Audio saliency is captured by signal modulations and related multifrequency band features, extracted through... more

descriptionView Paper arrow_downwardDownload

ALSA Debugging Tools and Techniques in Linux Kernel

by anish kumar

2024, International Journal of Science and Research

This document provides different tools and techniques useful for debugging audio issues in Linux, covering problems from booting to shutdown. This guide is not exhaustive but aims to explain potential audio issues or bugs that can arise... more

descriptionView Paper arrow_downwardDownload

Categorical Perception of Neutral Thirds Within the Musical Context

by Krzysztof Kicior

2024, Journal of the Audio Engineering Society

This paper investigates the contextual recognition of neutral thirds in music by integrating real-world musical context into the study of categorical perception. Traditionally, categorical perception has been studied using isolated... more

descriptionView Paper arrow_downwardDownload

Deep Neural Networks for the Recognition and Classification of Heart Murmurs Using Neuromorphic Auditory Sensors

by Juan Pedro Dominguez-Morales

2024, IEEE Transactions on Biomedical Circuits and Systems

Auscultation is one of the most used techniques for detecting cardiovascular diseases, which is one of the main causes of death in the world. Heart murmurs are the most common abnormal finding when a patient visits the physician for... more

descriptionView Paper arrow_downwardDownload

Deep Spiking Neural Network model for time-variant signals classification: a real-time speech recognition approach

by Juan Pedro Dominguez-Morales

2024

Speech recognition has become an important task to improve the human-machine interface. Taking into account the limitations of current automatic speech recognition systems, like non-real time cloud-based solutions or power demand, recent... more

descriptionView Paper arrow_downwardDownload

Audio processing in police investigations

by David Luknowsky

2024, Canadian Acoustics

descriptionView Paper arrow_downwardDownload

Audio processing in police investigations

by David Luknowsky

2024, Canadian Acoustics

One of the most common types of interference, and one of the simplest to reduce, is tones and hum. Hum typically comes from power lines, fluorescent lights or other electrical sources. It is seen in the spectrum as spikes at 60 Hz and harmonics thereof (Figure 1).

Figure 2. Spectrum from figure 1 with hum reduced using a comb filter.

Figure 3. Audio waveform with clipping (top) and corrected (bottom). Total length of audio shown is 9.6 ms.

descriptionView Paper arrow_downwardDownload

Convolutional Recurrent Neural Networks for Rare Sound Event Detection

by Emre çakır

2024

Sound events possess certain temporal and spectral structure in their time-frequency representations. The spectral content for the samples of the same sound event class may exhibit small shifts due to intra-class acoustic variability.... more

descriptionView Paper arrow_downwardDownload

Design Patterns for Resource-Constrained Automated Deep-Learning Methods

by Mohammadreza Amirian

2024, AI

We present an extensive evaluation of a wide variety of promising design patterns for automated deep-learning (AutoDL) methods, organized according to the problem categories of the 2019 AutoDL challenges, which set the task of optimizing... more

descriptionView Paper arrow_downwardDownload

SCOTT: un'architettura modulare per il testing di soluzioni basate su smartcard

by Andrea Angella

2024

Negli ultimi anni si è assistito a un incredibile aumento del numero di smartcard prodotte sul mercato; è sufficiente pensare alla quantità di SIM card presenti nel mondo per farsi un’idea del volume complessivo di vendite che nel solo... more

descriptionView Paper arrow_downwardDownload

Metodología de Control Activo de Ruido en Ductos

by Freddy Murillo

2024, cybertesis.urp.edu.pe

RESUMEN En este articulo presenta el diseño, implementación y comparación de técnicas de control activo de ruido, estos controladores se basa en técnicas de control adaptativo mediante un filtro FIR y el algoritmo LMS, el principio de... more

descriptionView Paper arrow_downwardDownload

Deep Neural Networks for the Recognition and Classification of Heart Murmurs Using Neuromorphic Auditory Sensors

by Manuel Domínguez

2024, IEEE Transactions on Biomedical Circuits and Systems

Auscultation is one of the most used techniques for detecting cardiovascular diseases, which is one of the main causes of death in the world. Heart murmurs are the most common abnormal finding when a patient visits the physician for... more

descriptionView Paper arrow_downwardDownload

Efficient target-response interpolation for a graphic equalizer

by Vesa Välimäki

2024, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A graphic equalizer is an adjustable filter in which the command gain of each frequency band is practically independent of the gains of other bands. Designing a graphic equalizer with a high precision requires evaluating a target response... more

Fig. 3. Maximum error of the equalizer responses at the command points obtained by varying variable a in the three test cases. The maximum ripple in the magnitude frequency response betweer 300 Hz and 400 Hz is 0.7dB for both the cubic Hermite [in Fig 1(b)] and for the LICS method (in Fig. 4). Thus, the ripples car be seen to have been reduced between command points in compar. ison to linear interpolation, and the obtained magnitude frequency response connects the command gain values more accurately thar most methods in Fig. 1(b).

Fig. 4. Target response and magnitude of the frequency response obtained using the LICS method (a = 0.2), cf. Fig. 1(b). Table 4. Execution Time for the Hermite Cubic Interpolation and the LICS Method.

Table 1. Command Frequencies f¢,m (Hz) of a Third-Octave Graphic Equalizer. 2. TARGET-RESPONSE INTERPOLATION

Table 3. Maximum Errors at Command Gain Points. Acceptable Errors Are Highlighted. points (f2, — f1,m) corresponds to the bands of a common third- octave graphic equalizer. However, this separation implies that there would exist overlapping bands (fom > fi,r form <r) and this would prevent proper interpolation between the command points.

Table 2. Optimal and Fixed Values of a and the Resulting Error for the Three Test Cases.

Fig. 1. (a) Target responses and (b) magnitude of the fre- quency responses obtained from command gains by using different interpolation methods: zeroth-order, linear, cubic Lagrange, and cu- bic Hermite interpolation.

Fig. 2. In the LICS method, frequency points fi,m and f2,m limit the flat portion that surrounds the command gain value.

descriptionView Paper arrow_downwardDownload

Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separation

by Ngọc Ánh Dương

2024, 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010)

We address the problem of blind audio source separation in the under-determined and convolutive case. The contribution of each source to the mixture channels in the time-frequency domain is modeled by a zero-mean Gaussian random vector... more

descriptionView Paper arrow_downwardDownload

Time-Space Ensemble Strategies for Automatic Music Genre Classification

by Celso Kaestner

2024, Springer eBooks

In this paper we propose a novel time-space ensemble-based approach for the task of automatic music genre classification. Ensemble strategies employ several classifiers to different views of the problemspace, and combination rules in... more

descriptionView Paper arrow_downwardDownload

Supporti alla programmazione Grid-aware - esperienze di allocazione dinamica di programmi ASSIST a componenti

by Luigi Presti

2024

In questa tesi viene proposta e sperimentata una metodologia per il supporto di applicazioni grid-aware espresse come moduli paralleli ASSIST puri e incapsulati in componenti CCM (CORBA Component Model). La metodologia oggetto della tesi... more

descriptionView Paper arrow_downwardDownload

MixedEmotions: An Open-Source Toolbox for Multimodal Emotion Analysis

by Francesco Adolfo Danza

2024, IEEE Transactions on Multimedia

Recently, there is an increasing tendency to embed functionalities for recognizing emotions from user generated media content in automated systems such as call-centre operations, recommendations and assistive technologies, providing... more

Fig. 1. Orchestrator within the MixedEmotions Toolbox.

Fig. 2. Knowledge Graph module architecture.

Fig. 3. Architecture of the Social Context Analysis module ‘Scanner’.

A SHORT LIST OF AVAILABLE EMOTION ANALYZER SERVICES FOR T(EXTUAL), F(ACIAL), AND S(PEECH) CONTENTS. TABLE I

B. Audio Processing This module recogniz es emotions in terms of arousal and valence from speech signals!*. It is based on the Bag- of-Audio-Words (BoAW) approach [62], trained on continu- ous emotionally labeled RECOLA is an audio-v data (the RECOLA database [63]). isual database of 46 subjects during dyadic conversation in French. For each subject, a recording of 5 minutes length has been annotated time-continuously for Arousal and Valence dimensions by six different annotators (3 female, 3 male). From the 6 annotations, a single gold standard sequence has been computed for each dimension, using an evaluator weighted estimator [64].

>ERFORMANCE (CCC) OF THE AUDIO EMOTION RECOGNITION MODULE ON THE RECOLA DATABASE. To make the power of the histogram independent from the duration of the input segment, a histogram normalization is performed. The whole BoAW-processing is accomplished by the open-source toolkit openXBOW!> [68].

Linked Data principles have been investigated to define pro- tocols and approaches to link the information of these sources to each other [86]. In the MixedEmotions Toolbox, the JSON Linked-Data (JSON-LD) format has been used for this task (Section HI-D1). The use of linked data formats allows us to easily connect resources to common sense knowledge captured in knowledge graphs such as DBpedia [50] (Section IIJ-D2).

descriptionView Paper arrow_downwardDownload

Detection of Acoustic Change-Points in Audio Streams and Signal Segmentation

by Jindřich Žďánský

2024

This contribution proposes an efficient method for the detection ofrelevant changes in continuous stream of sound. The detectedchange-points can then serve for the segmentation of long audiorecordings into shorter and more or less... more

descriptionView Paper arrow_downwardDownload

Nonnegative signal factorization with learnt instrument models for sound source separation in close-microphone recordings

by Francisco José Rodríguez Serrano

2024, EURASIP Journal on Advances in Signal Processing

Close-microphone techniques are extensively employed in many live music recordings, allowing for interference rejection and reducing the amount of reverberation in the resulting instrument tracks. However, despite the use of directional... more

descriptionView Paper arrow_downwardDownload

Identification of Active Sources in Single-Channel Convolutive Mixtures Using Known Source Models

by Thippur Sreenivas

2024, IEEE Signal Processing Letters

We address the problem of identifying the constituent sources in a single-sensor mixture signal consisting of contributions from multiple simultaneously active sources. We propose a generic framework for mixture signal analysis based on a... more

descriptionView Paper arrow_downwardDownload

L2 Gestione Processi AS.

by Zambelli Filippo

2024

descriptionView Paper arrow_downwardDownload

Feedback-Based Gameplay Metrics and Gameplay Performance Segmentation: An audio-visual approach for assessing player experience

by Raphaël Marczak

2024

Finally, I would like to thank all my friends and family who tried to understand and respect my choice, and who never tried to prevent me from doing this "crazy" jump into the void. I am coming back better and stronger, and this is all... more

descriptionView Paper arrow_downwardDownload

First steps toward embedding real-time audio computing in Antescofo

by Jean-Louis Giavitto

2024, HAL (Le Centre pour la Communication Scientifique Directe)

The develop of Antescofo software has allowed contemporary musicians to create interactive music pieces in a more precise way in terms of the synchronization between human and machine. INRIA's MUTANT team has been developing a version of... more

Figure 4.- Anthémes 2 effects patch in Max MSP.

After this change, the respective time of computing the dspchain_tick function were 5900ms in the case of the Anthémes 2 score using the effects in external patches, and 3150ms in the case of the Antescofo-Faust version of the score. This number represents the sum of the time from all the times that the “dspchain_tick” was called during the non-real- time performance. This numbers represents an improvement of the performance of 46%, which is significant thinking on the improvement of the UDOO platform. As the UDOO computational resources are way fewer than the Mac computer used on this tests, we can say that the improvement of 46.5% will allow Antescofo to work on the mini - computer much faster and with better performance. Figure 6.- Time profiler for the Anthémes 2 using Faust

descriptionView Paper arrow_downwardDownload

Multi-Source Musaicing Using Non-Negative Matrix Factor 2-D Deconvolution

by Hadrien Foroughmand

2024, HAL (Le Centre pour la Communication Scientifique Directe)

A recent trend is to use Music Information Retrieval algorithms for creativity. When considering the audio signal as observation, a well-known method of data-driven synthesis is the "concatenative synthesis" also named musaicing (audio... more

descriptionView Paper arrow_downwardDownload

Let It Bee - Towards Nmf-Inspired Audio Mosaicing

by Meinard Müller

2024

A swarm of bees buzzing "Let it be" by the Beatles or the wind gently howling the romantic "Gute Nacht" by Schubert-these are examples of audio mosaics as we want to create them. Given a target and a source recording, the goal of audio... more

descriptionView Paper arrow_downwardDownload

Acoustic Analysis of Voluntary Coughs, Throat Clearings, and Induced Reflexive Coughs in a Healthy Population

by Dirk Van Gestel

2024, Dysphagia

Cough efficacy is considered a reliable predictor of the aspiration risk in head and neck cancer patients with radiation-associated dysphagia. Currently, coughing is assessed perceptually or aerodynamically. The goal of our research is to... more

descriptionView Paper arrow_downwardDownload

Machine Learning Evaluation for Music Genre Classification of Audio Signals

by Siddhant Pathak

2024, International Journal of Grid and High Performance Computing

Music genre classification has its own popularity index in the present times. Machine learning can play an important role in the music streaming task. This research article proposes a machine learning based model for the classification of... more

descriptionView Paper arrow_downwardDownload

Least squares signal declipping for robust speech recognition

by Richard Stern

2024, Interspeech 2014

This paper introduces a novel declipping algorithm based on constrained least-squares minimization. Digital speech signals are often sampled at 16 kHz and classic declipping algorithms fail to accurately reconstruct the signal at this... more

descriptionView Paper arrow_downwardDownload

Using computer audio-recorded interviewing to assess interviewer coding error

by mai nguyen

2023, Proceedings of the …

Computer-assisted audio recordings provide a new approach for detecting and correcting interviewer coding error. For questions with categorical and other specify responses, it is possible for the interviewer to misinterpret, abbreviate,... more

descriptionView Paper arrow_downwardDownload

Matching live sources with physical models

by Mark Sandler

2023

This paper investigates the use of a physical model template database as the parameter basis for a MPEG-4 Structured Audio (MP4-SA) codec. During analysis, the codec attempts to match the closest corresponding instrument in the database.... more

descriptionView Paper arrow_downwardDownload

Audio Analysis as a Control Knob for Social Sensing

by Graham Bent

2023

While humans can act as effective sensors, human input is subject to a high degree of error and highly dependent on the context. Furthermore, extracting the signal from the noise for social sensing is a difficult challenge. One approach... more

descriptionView Paper arrow_downwardDownload

Regularized Multivariate Analysis Framework for Interpretable High-Dimensional Variable Selection

by Vanessa Gómez-verdejo

2023, IEEE Computational Intelligence Magazine

Multivariate Analysis (MVA) comprises a family of well-known methods for feature extraction which exploit correlations among input variables representing the data. One important property that is enjoyed by most such methods is... more

descriptionView Paper arrow_downwardDownload

Energy Efficient Animal Sound Recognition Scheme in Wireless Acoustic Sensors Networks

by Badour AlMulhem

2023, International journal of wireless and mobile networks

Wireless sensor network (WSN) has proliferated rapidly as a cost-effective solution for data aggregation and measurements under challenging environments. Sensors in WSNs are cheap, powerful, and consume limited energy. The energy... more

descriptionView Paper arrow_downwardDownload

Structured audio and effects processing in the MPEG-4 multimedia standard

by Eric Scheirer

2023, Elsevier eBooks

While previous generations of the MPEG multimedia standard have focused primarily on coding and transmission of content digitally sampled from the real world, MPEG-4 contains extensive support for structured, synthetic and... more

descriptionView Paper arrow_downwardDownload

Detection of explosive cough events in audio recordings by internal sound analysis

by Rui Pedro Paiva

2023, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

We present a new method for the discrimination of explosive cough events, which is based on a combination of spectral content descriptors and pitch-related features. After the removal of near-silent segments, a vector of event boundaries... more

descriptionView Paper arrow_downwardDownload

A Perceptual Subspace Approach for Modeling of Speech and Audio Signals With Damped Sinusoids

by Søren Holdt Jensen

2023, IEEE Transactions on Speech and Audio Processing

The problem of modeling a signal segment as a sum of exponentially damped sinusoidal components arises in many different application areas, including speech and audio processing. Often, model parameters are estimated using subspace based... more

descriptionView Paper arrow_downwardDownload

Occupancy Estimation in Smart Buildings using Audio-Processing Techniques

by Zhenhao Ge

2023, arXiv (Cornell University)

In the past few years, several case studies have illustrated that the use of occupancy information in buildings leads to energy-efficient and low-cost HVAC operation. The widely presented techniques for occupancy estimation include... more

descriptionView Paper arrow_downwardDownload

Multiple Feature Resolutions for Different Polyphonic Sound Detection Score Scenarios in DCASE 2021 Task 4

by Sergio Segovia

2023

In this paper, we describe our multi-resolution mean teacher systems for DCASE 2021 Task 4: Sound event detection and separation in domestic environments. Aiming to take advantage of the different lengths and spectral characteristics of... more

descriptionView Paper arrow_downwardDownload

VI Signal Sensor 1 Signal Sensor 2 Feature Extraction Feature Extraction Feature Selection Training ANN Test Results Labels Test Signals & LabelsLabels

by Carlos Valderrama

2023

Cough detection and classification present necessary tools for the evaluation of pathology severity in chronic illnesses. In literature, several approaches have been proposed for this aim. The latter presented a relative success since... more

descriptionView Paper arrow_downwardDownload

Acoustic Analysis of Voluntary Coughs, Throat Clearings, and Induced Reflexive Coughs in a Healthy Population

by Dirk Van Gestel

2023, Dysphagia

Cough efficacy is considered a reliable predictor of the aspiration risk in head and neck cancer patients with radiation-associated dysphagia. Currently, coughing is assessed perceptually or aerodynamically. The goal of our research is to... more

descriptionView Paper arrow_downwardDownload

Acoustic Analysis of Voluntary Coughs, Throat Clearings, and Induced Reflexive Coughs in a Healthy Population

by Dirk Van Gestel

2023, Dysphagia

Cough efficacy is considered a reliable predictor of the aspiration risk in head and neck cancer patients with radiation-associated dysphagia. Currently, coughing is assessed perceptually or aerodynamically. The goal of our research is to... more

descriptionView Paper arrow_downwardDownload

A fast iterative kernel PCA feature extraction for hyperspectral images

by Aleksandra Pizurica

2023, 2010 IEEE International Conference on Image Processing

A fast iterative Kernel Principal Component Analysis (KPCA) is proposed to extract features from hyperspectral images. The proposed method is a kernel version of the Candid Covariance-Free Incremental Principal Component Analysis, which... more

descriptionView Paper arrow_downwardDownload

Tempo extraction using beat histograms

by George Tzanetakis

2023, Proceedings of the 1st Music Information Retrieval Evaluation eXchange (MIREX 2005)

This abstract describes the tempo extraction algorithm used for the University of Victoria submission to the MIREX (Music Information Retrieval Exchange) 2005. The algorithm is mostly based on self-similarity rather than onset detection.... more

descriptionView Paper arrow_downwardDownload

Audio Processing

Related Topics