Audio Event Detection Research Papers

Detection of Anomalous Sounds for Machine Condition Monitoring Using Classification Confidence

2025

Anomaly-detection methods based on classification confidence are applied to the DCASE 2020 Task 2 Challenge on Unsupervised Detection of Anomalous Sounds for Machine Condition Monitoring. The final systems for submitting to the challenge... more

descriptionView Paper arrow_downwardDownload

Overlapped Music segmentation using a new Effective Feature and Random Forests

by duraid Mohammed

2025, IAES International Journal of Artificial Intelligence (IJ-AI)

In the field of audio classification, audio signals may be broadly divided into three classes: speech, music and events. Most studies, however, neglect that real audio soundtracks can have any combination of these classes simultaneously.... more

descriptionView Paper arrow_downwardDownload

Improving Deep Learning Sound Events Classifiers Using Gram Matrix Feature-Wise Correlations

by Andre Pacheco

2025, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we propose a new Sound Event Classification (SEC) method which is inspired in recent works for out-ofdistribution detection. In our method, we analyse all the activations of a generic CNN in order to produce feature... more

descriptionView Paper arrow_downwardDownload

Multimodal Evaluation Method for Sound Event Detection

by Aomar Osmani

2025, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Time is an important dimension in sound event detection (SED) systems. However, evaluating the performance of SED systems is directly taken from the classical machine learning domain, and they are not well adapted to the needs of these... more

descriptionView Paper arrow_downwardDownload

A Multi-Resolution CRNN-Based Approach for Semi-Supervised Sound Event Detection in DCASE 2020 Challenge

by Doroteo Toledano

2025, IEEE Access

Sound Event Detection is a task with a rising relevance over the recent years in the field of audio signal processing, due to the creation of specific datasets such as Google AudioSet or DESED (Domestic Environment Sound Event Detection)... more

descriptionView Paper arrow_downwardDownload

Analysis and interpretation of joint source separation and sound event detection in domestic environments

by Doroteo Toledano

2025, Plos ONE

In recent years, the relation between Sound Event Detection (SED) and Source Separation (SSep) has received a growing interest, in particular, with the aim to enhance the performance of SED by leveraging the synergies between both tasks.... more

descriptionView Paper arrow_downwardDownload

Enhancing Conformer-Based Sound Event Detection Using Frequency Dynamic Convolutions and BEATs Audio Embeddings

by Doroteo Toledano

2025

Over the last few years, most of the tasks employing Deep Learning techniques for audio processing have achieved stateof-the-art results employing Conformer-based systems. However, when it comes to sound event detection (SED), it was... more

descriptionView Paper arrow_downwardDownload

Unsupervised Anomaly Detection on Temporal Multiway Data

by Phuoc Nguyen

2025, 2020 IEEE Symposium Series on Computational Intelligence (SSCI)

Temporal anomaly detection looks for irregularities over space-time. Unsupervised temporal models employed thus far typically work on sequences of feature vectors, and much less on temporal multiway data. We focus our investigation on... more

descriptionView Paper arrow_downwardDownload

Audio-based Anomaly Detection in Industrial Machines Using Deep One-Class Support Vector Data Description

by Sertaç Kılıçkaya

2024

The frequent breakdowns and malfunctions of industrial equipment have driven increasing interest in utilizing cost-effective and easy-to-deploy sensors, such as microphones, for effective condition monitoring of machinery. Microphones... more

descriptionView Paper arrow_downwardDownload

Attention-Inspired Artificial Neural Networks for Speech Processing: A Systematic Review

by Noel Zacarias-Morales and

2024, Symmetry

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution.

descriptionView Paper arrow_downwardDownload

I'm Sorry for Your Loss: Spectrally-Based Audio Distances Are Bad at Pitch

by Joseph Turian

2024, arXiv (Cornell University)

Growing research demonstrates that synthetic failure modes imply poor generalization. We compare commonly used audio-to-audio losses on a synthetic benchmark, measuring the pitch distance between two stationary sinusoids. The results are... more

descriptionView Paper arrow_downwardDownload

A Deep Learning Approach for Unsupervised Failure Detection in Smart Industry (Discussion Paper)

by Salvatore Iiritano

2024, SEBD

We propose an unsupervised anomaly detection model that is able to identify abnormal behavior by analysing streaming data coming from IoT sensors installed on critical devices. The proposed model is based on a Siamese neural network which... more

descriptionView Paper arrow_downwardDownload

A Deep Learning Approach for Unsupervised Failure Detection in Smart Industry (Discussion Paper)

by Salvatore Iiritano

2024

We propose an unsupervised anomaly detection model that is able to identify abnormal behavior by analysing streaming data coming from IoT sensors installed on critical devices. The proposed model is based on a Siamese neural network which... more

descriptionView Paper arrow_downwardDownload

Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation

by xianjun xia

2024, arXiv (Cornell University)

In this technical report, we present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1-Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge. Task 1 comprises two different sub-tasks: (i) Task 1a... more

descriptionView Paper arrow_downwardDownload

Anomaly Detection in Traffic Trajectories Using a Combination of Fuzzy, Deep Convolutional and Autoencoder Networks

by Journal of Computer and Knowledge Engineering

2024, Journal of Computer and Knowledge Engineering

Due to the increasing deployment of vehicles in human societies and the necessity for smart traffic control, anomaly detection is among the various tasks widely employed in traffic monitoring. As the issue of urban traffic and their... more

descriptionView Paper arrow_downwardDownload

Neural Network Distillation on IoT Platforms for Sound Event Detection

by Rahul Prasad

2024, Interspeech 2019

In most classification tasks, wide and deep neural networks perform and generalize better than their smaller counterparts, in particular when they are exposed to large and heterogeneous training sets. However, in the emerging field of... more

descriptionView Paper arrow_downwardDownload

Anomalous Sound Detection For Road Surveillance based On Graph Signal Processing

by Thierry BOUWMANS

2024, EUSIPCO 2024

Recently, Anomalous Sound Detection (ASD) has emerged as a promising method for road surveillance. However, since the ratio of anomalous events is generally too small, anomaly detection in general, and ASD in particular, are mainly... more

descriptionView Paper arrow_downwardDownload

Musical Instrument Synthesis and Morphing in Multidimensional Latent Space Using Variational, Convolutional Recurrent Autoencoders

by Emre çakır

2024, Journal of The Audio Engineering Society

In this work, we propose a deep learning based method, namely, variational, convolutional recurrent autoencoders (VCRAE), for musical instrument synthesis. This method utilizes the higher level time-frequency representations extracted by... more

descriptionView Paper arrow_downwardDownload

Deep Neural Networks for Sound Event Detection

by Emre çakır

2024

The objective of this thesis is to develop novel classification and feature learning techniques for the task of sound event detection (SED) in real-world environments. Throughout their lives, humans experience a consistent learning... more

descriptionView Paper arrow_downwardDownload

An Encoder-Decoder Based Approach for Anomaly Detection with Application in Additive Manufacturing

by Alberto Sangiovanni Vincentelli

2024, arXiv (Cornell University)

We present a novel unsupervised deep learning approach that utilizes the encoder-decoder architecture for detecting anomalies in sequential sensor data collected during industrial manufacturing. Our approach is designed not only to detect... more

descriptionView Paper arrow_downwardDownload

DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System

by Ankit Shah

2024

DCASE 2017 Challenge consists of four tasks: acoustic scene classification , detection of rare sound events, sound event detection in real-life audio, and large-scale weakly supervised sound event detection for smart cars. This paper... more

descriptionView Paper arrow_downwardDownload

Capturing Musical Structure Using Convolutional Recurrent Latent Variable Model

by Shlomo Dubnov

2024

In this paper, we present a model for learning musical features and generating novel sequences of music. Our model, the Convolutional-Recurrent Variational Autoencoder (C-RVAE), captures short-term polyphonic sequential musical structure... more

Figure 2: Approximating our distribution. Loss function, KL-loss function, and Reconstruction loss function were computed during training. over 2k epochs. The training MIDI data! were processed into 512 individual time steps, utilizing the encoded tempo to determine the time resolution such that each time step was one 8" note. At each time step of training, we set a frame of music to be half of a bar and train on 8 bars at a time such that we are reconstructing 16 frames in each training step.

For proof of concept, a composer participated in the listening test of our generated musical output One of the most difficult challenges in music composition by neural networks is to verify the quality of the music itself. To verify our model and results, we analyzed the musical structure with the composer. We randomly selected 10 generation results of the C-RVAE model and composers were asked to describe the characteristics of each generated sequence. "The computer almost catches the primary structure of melodic skeleton. And the original chord progression is being reduced to 3 chords, which forms a structured music.", "The ending chord is very satisfied since it isn’t on a clear and strong musical cadence. If the bass note D could be replaced by the G, it might be more persuasive." The C-RVAE model can compose music more dynamically while including the original theme. Our sample results are also posted on soundcloud?. Figure 3: Comparison of music used as input and generated result. Top: The first 10 bars of the training input data / Bottom: The first 10 bars of the generated musical output

descriptionView Paper arrow_downwardDownload

A Relational Database Model and Tools for Environmental Sound Recognition

by Abdussamet Tanıs

2024, Advances in Science, Technology and Engineering Systems Journal

Environmental sound recognition (ESR) has become a hot topic in recent years. ESR is mainly based on machine learning (ML) and ML algorithms require first a training database. This database must comprise the sounds to be recognized and... more

descriptionView Paper arrow_downwardDownload

Deep Feature Learning for Wireless Spectrum Data

by Mihael Mohorcic

2024, arXiv (Cornell University)

In recent years, the traditional feature engineering process for training machine learning models is being automated by the feature extraction layers integrated in deep learning architectures. In wireless networks, many studies were... more

descriptionView Paper arrow_downwardDownload

Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging

by Philip Jackson

2024, IEEE/ACM Transactions on Audio, Speech, and Language Processing

Environmental audio tagging aims to predict only the presence or absence of certain acoustic events in the interested acoustic scene. In this paper, we make contributions to audio tagging in two parts, respectively, acoustic modeling and... more

where FE, ;¢ and yee are the mean squared error and binary cross-entropy, T,, (X"*7,W,b) and T,, denote the estimated T:—-T 3 and reference tag vector at sample index n, respectively, with NV representing the mini-batch size, X""*7 being the input audio nT feature vector where the window size of context is 27 + 1. It should be noted that the input window size should cover a large Fig. 1 shows the proposed DNN-based audio tagging frame- work using the shrinking structure, i.e., the hidden layer size is gradually reduced through depth. In [23], it is shown that this structure can reduce the model size, training and test time without losing classification accuracy. Furthermore, this struc- ture can serve as a deep PCA [28] to reduce the redundancy and background noise in the audio recordings. With the proposed framework, a large set of features of the chunk are encoded into a vector with values {0, 1}. Sigmoid was used as the activation function of the output layer to learn the presence probability of certain events. Rectified linear unit (ReLU) is the activation function for hidden units. Mean squared error (MSE) and bi- nary cross-entropy were adopted and compared as the objective function. As the labels of the audio tagging are binary values, binary cross-entropy can get a faster training and better perfor- mance than MSE [29]. A stochastic gradient descent algorithm is performed in mini-batches with multiple epochs to improve earning convergence as follows,

Fig. 2. A typical one hidden layer of de-noising auto-encoder [38] structure with an encoder and a decoder. Some input units are set to zero by the Dropout process (shown by a black cross “*X’’) to train a more robust system.

Fig. 4. The reconstruction error over the CV set of the asymmetric DAE with 50 ReLU units in the bottleneck layer (denoted as asyDAE-50ReLU) and the symmetric DAE with 200 ReLU units in the bottleneck layer (denoted as syDAE-200ReLU). Fig. 3. The framework of deep asymmetric DAE (asyDAE) based unsupervised feature learning for audio tagging. The weights between the encoder and tt decoder are untied to retain more contextual information into the bottleneck layer (shown in the dashed rectangle).

Fig. 5. The box-and-whisker plot of EERs, among the GMM baseline, Mel-Filter bank (MFB)-DNN baseline and asymmetric DAE (asyDAE)-DNN method, across five standard folds on the development set. *Lidy-CQT-CNN [18] did not measure the EER results on the development set.

Fig. 6. Spectrograms of the reconstructed Mel-Filter Banks (MFBs) by the deep asymmetric DAE (asyDAE) and deep symmetric DAE (syDAE), and also the original MFBs. The dotted ovals indicate the smoothed parts on the recon- structed MFBs. The Y-axis is the frequency bin and the X-axis is the frame number.

Fig. 7. EERs on Fold 1| of the development set evaluated using different number of frame expansions in the input layer of the MFB-DNN. PRECISION, RECALL AND SCORE COMPARISONS BETWEEN THE MFB-DNN BASELINE AND THE ASYDAE-DNN METHOD, WHICH ARE EVALUATED FOR SEVEN TAGS ON THE FINAL EVALUATIONS SET OF THE DCASE2016 AUDIO TAGGING TASK

Fig. 8. _EERs on Fold | of the development set evaluated using different features, namely MFCCs and Mel-Filter Banks (MFBs), different loss functions, namely mean squared error (MSE) and binary cross entropy (BCE).

Fig. 9. _EERs on Fold 0 of the development set evaluated using different de- noising auto-encoder configurations and compared with the MFB-DNN base- line. syDAE-ReLU200 means the symmetric DAE with 200 ReLU units in the bottleneck layer. asyDAE-Linear50 means the asymmetric DAE with 50 linear units in the bottleneck layer. aAE-ReLUS0 denotes the asymmetric auto-encoder without de-noising (or dropout).

Fig. 10. The audio spectrogram of the deep asymmetric DAE (asyDAE) features with the non-negative representation. representation or optimized feature of the original MFBs. The units of the bottleneck layer in the deep asyDAE are all acti- vated by the ReLU functions as mentioned in Sec. III. Hence, the values of the learned feature are all non-negative, leading to a non-negative representation of the original MFBs. Such a non-negative representation can then be multiplied with the weights in the decoding part of the DAE to obtain the re- constructed MFBs. It is also adopted to replace the MFBs as the input to the DNN classifier to make a better predic- tion for the tags. The pure blue area at some dimensions in Fig. 10 indicates the zero values in the ReLU activation function.

LABELS USED IN ANNOTATIONS For each chunk, multi-label annotations were first obtained from each of 3 annotators. There are 4378 such chunks available, referred to as CHiME-Home-raw [20]; discrepancies between annotators are resolved by conducting a majority vote for each label. The annotations are based on a set of 7 label classes as shown in Table I. A detailed description of the annotation pro- cedure is provided in [20]. To reduce uncertainty about annota- tions, evaluations are based on considering only those chunks where 2 or more annotators agreed about label presence across label classes. There are 1946 such chunks available, referred to as CHiME-Home-refined [20]. Another 816 refined chunks are kept for the final evaluation set of Task 4 of the DCASE 2016 challenge.

THE NUMBER OF AUDIO CHUNKS FOR TRAINING AND TEST FOR THE DEVELOPMENT SET AND THE FINAL EVALUATION SET

EER COMPARISONS ON SEVEN LABELS AMONG THE PROPOSED ASYDAE-DNN, SYDAE-DNN, DNN BASELINE TRAINED ON MFB, DNN BASELINE TRAINED ON MFCC METHODS, YUN-MFCC-GMM [45], CAKIR-MFCC-CNN [19], Lipy-CQT-CNN [18], SVM TRAINED ON CHUNKS, SVM TRAINED ON FRAMES AND GMM METHODS [11], WHICH ARE EVALUATED ON THE DEVELOPMENT SET AND THE EVALUATION SET

EERs FOR FOLD 1 ACROSS SEVEN TAGS USING DNNSs AND GMMs TRAINED ON THE ‘CHIME-HOME-RAW’ SET AND ‘CHIME-HOME-REFINED’ SET is that the performance was almost the same if there is no de- noising (or dropout) operation (denoted as aAE-ReLUS50) in the ordinary auto-encoder. The reason is that the baseline DNN is well trained on MFBs with the binary cross-entropy as the loss function.

descriptionView Paper arrow_downwardDownload

Zero-Shot Anomalous Sound Detection in Domestic Environments Using Large-Scale Pretrained Audio Pattern Recognition Models

by Giulio Zanetti

2024, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Anomalous sound detection is central to audio-based surveillance and monitoring. In a domestic environment, however, the classes of sounds to be considered anomalous are situation-dependent and cannot be determined in advance. At the same... more

descriptionView Paper arrow_downwardDownload

Robust acoustic scene classification using a multi-spectrogram encoder-decoder framework

by Lâm Phạm

2024, Digital Signal Processing

This article proposes an encoder-decoder network model for Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. We make use of multiple low-level spectrogram features at... more

descriptionView Paper arrow_downwardDownload

Convolutional Recurrent Neural Networks for Earthquake Epicentral Distance Estimation Using Single-Channel Seismic Waveform

by Yuanming LI

2024

We propose two deep neural network architectures for classification of arbitrary-length electrocardiogram (ECG) recordings and evaluate them on the atrial fibrillation (AF) classification data set provided by the Phy-sioNet/CinC Challenge... more

descriptionView Paper arrow_downwardDownload

Unsupervised Anomaly Detection on Temporal Multiway Data

by Kiên Đỗ

2024, 2020 IEEE Symposium Series on Computational Intelligence (SSCI)

Temporal anomaly detection looks for irregularities over space-time. Unsupervised temporal models employed thus far typically work on sequences of feature vectors, and much less on temporal multiway data. We focus our investigation on... more

descriptionView Paper arrow_downwardDownload

Brain subtle anomaly detection based on auto-encoders latent space analysis : application to de novo parkinson patients

by Michel Dojat

2024, arXiv (Cornell University)

Neural network-based anomaly detection remains challenging in clinical applications with little or no supervised information and subtle anomalies such as hardly visible brain lesions. Among unsupervised methods, patch-based autoencoders... more

Fig. 1: The trained encoder extracts latent representation z of patches, used by 1) a decoder to compute reconstruction error in the image space 2) OC-SVM and 3) MMST to perform outlier detection in the latent space. Anomaly maps representing the percentage of abnormal voxels per brain structures are shown on the right, warm colors corresponding to the highest percentages.

Fig. 2: g-mean score of the 3 UAD and 2 CNN models. For UAD models, we consider anomaly % on the whole brain and per region, including the 8 subcortical structures from the MNI PD25 atlas: substantia nigra (SN), red nucleus (RN), subthalamic nucleus (STN), globus pallidus interna and externa (GPi, GPe), thalamus, putamen and caudate nucleus.

descriptionView Paper arrow_downwardDownload

An Auto Encoder For Audio Dolphin Communication

by Denise Herzing

2024, arXiv (Cornell University)

Research in dolphin communication and cognition requires detailed inspection of audible dolphin signals. The manual analysis of these signals is cumbersome and time-consuming. We seek to automate parts of the analysis using modern deep... more

descriptionView Paper arrow_downwardDownload

Automated Antenna Testing Using Encoder-Decoder-based Anomaly Detection

by Jiawen Xu

2024, 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)

We propose a new method for testing antenna arrays that records the radiating electromagnetic (EM) field using an absorbing material and evaluating the resulting thermal image series through an AI using a conditional encoder-decoder... more

descriptionView Paper arrow_downwardDownload

Automated Antenna Testing Using Encoder-Decoder-based Anomaly Detection

by Jiawen Xu

2024, 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)

We propose a new method for testing antenna arrays that records the radiating electromagnetic (EM) field using an absorbing material and evaluating the resulting thermal image series through an AI using a conditional encoder-decoder... more

Fig. 1: Schema of automated antenna testing system using time series thermograms. The condition denotes the power and phase of the input signal to each array element. The time series thermograms contain blobs of various shape. Only the greenish blob pixels provide useful indications of anomalies, while the remaining pixels are background noise.

Fig. 2: Unrolled conditional CNN-LSTM VAE with contour-based detector. The reconstructed £; is the mean of observation model po(a|z). Here a G-VAE model is presented to demonstrate the concept of time series thermograms anomaly detection. However this framework is applicable to probabilistic VAE and AE. For probabilistic VAE, decoder has another head to predict variance per pixel. For AE, the encoder requires only one linear head to learn a deterministic latent representation. The condition modeling modules are the same for all the models.

Fig. 4: Setup without insulation for this configuration of the patch antenna. The dimensions of the paper are set to 50 x 50mm? with a thickness of 80 ym allowing to cover the whole aperture of the AUT and leaving some extra margins. A thermal camera, positioned at the focal distance of the lens, detects the change in temperature on the surface of the paper. The camera used in this experiment is from Flir A615 operating at room temperature. Since Flir A615 is a GenICam, any GenICam compliant Software with a driver could be used. In this experiment, we had used Stemmer Imaging GenICam Driver for Image acquisition and Halcon for processing the images from the Camera. This whole system is insulated against conduction and convection heat loss by building inside a climate chamber.

Fig. 5: (a) Ground truth (GT) from a representative image of a normal sequence; (b-e) reconstructions (first row) and residuals (second row) for PCVAE, CVAE with observation SD of 1, CVAE with observation SD of 0.01, and AE, resp.; (f) ground truth from a representative image of an anomalous sequence with the same configuration and time step as (a); (g-j) reconstructions and residuals from anomalous data.

Fig. 6: Visualization of log-variance for both normal patterr and anomalous pattern using PCVAE from Figs. 5a and 5f

Fig. 7: Example distribution of anomaly scores over time from 32 normal thermal image sequence and 32 anomalous sequence with the same configuration

Fig. 8: Receiver operating characteristic (ROC) curves over the entire test set

TABLE II: Evaluation results of F-measure (F-M), sensitivity (Sn) and precision (Pr) for contour-based anomaly detector

descriptionView Paper arrow_downwardDownload

Deep Correlation-Aware Kernelized Autoencoders for Anomaly Detection in Cybersecurity

by padmaksha roy

2024, arXiv (Cornell University)

Unsupervised learning-based anomaly detection in latent space has gained importance since discriminating anomalies from normal data becomes difficult in high-dimensional space. Both density estimation and distance-based methods to detect... more

descriptionView Paper arrow_downwardDownload

TASK 2 DCASE 2020: ANOMALOUS SOUND DETECTION USING UNSUPERVISED AND SEMI-SUPERVISED AUTOENCODERS AND GAMMTONE AUDIO REPRESENTATION Technical Report

by Pedro Zuccarello

2024

Anomalous sound detection (ASD) is one of the fields of machine listening that is attracting most attention among the scientific community. Unsupervised detection is attracting a lot of interest due to its immediate applicability in many... more

descriptionView Paper arrow_downwardDownload

Unsupervised Detection of Anomalous Sound for Machine Condition Monitoring using Fully Connected U-Net

by Nguyen Chi Hieu B2307887

2024, Journal of ICT Research and Applications

Anomaly detection in the sound from machines is an important task in machine monitoring. An autoencoder architecture based on the reconstruction error using a log-Mel spectrogram feature is a conventional approach for this domain.... more

descriptionView Paper arrow_downwardDownload

Weighted and Multi-Task Loss for Rare Audio Event Detection

by Huy Phan

2024, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The version in the Kent Academic Repository may differ from the final published version. Users are advised to check http://kar.kent.ac.uk for the status of the paper. Users should always cite the published version of record.

descriptionView Paper arrow_downwardDownload

Robust acoustic scene classification using a multi-spectrogram encoder-decoder framework

by Huy Phan

2024, Digital Signal Processing

This article proposes an encoder-decoder network model for Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. We make use of multiple low-level spectrogram features at... more

descriptionView Paper arrow_downwardDownload

ANOMALOUS SOUND DETECTION WITH MASKED AUTOREGRESSIVE FLOWS AND MACHINE TYPE DEPENDENT POSTPROCESSING Technical Report

by Verena Haunschmid

2024

This technical report describes the submission from the CP JKU/SCCH team for Task 2 of the DCASE2020 challenge Unsupervised Detection of Anomalous Sounds for Machine Condition Monitoring. Our approach uses a Masked Autoregressive Flow... more

descriptionView Paper arrow_downwardDownload

Semi-supervised Learning for Marked Temporal Point Processes

by Shivshankar Reddy

2024, ArXiv

Temporal Point Processes (TPPs) are often used to represent the sequence of events ordered as per the time of occurrence. Owing to their flexible nature, TPPs have been used to model different scenarios and have shown applicability in... more

descriptionView Paper arrow_downwardDownload

Anomaly Detection in Electromechanical Systems by means of Deep-Autoencoder

by Miguel Delgado Prieto

2024, 2021 26th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA )

Anomaly detection in manufacturing processes is one of the main concerns in the new era of the Industry 4.0 framework. The detection of uncharacterized events represents a major challenge within the operation monitoring of electrical... more

descriptionView Paper arrow_downwardDownload

Automatic Identification of Bird Species from Audio

by Silvestre Carvalho

2024, Intelligent Information and Database Systems

Bird species identification is a relevant and time-consuming task for ornithologists and ecologists. With growing amounts of audio annotated data, automatic bird classification using machine learning techniques is an important trend in... more

descriptionView Paper arrow_downwardDownload

Automatic Identification of Bird Species from Audio

by Silvestre Carvalho

2024, Lecture Notes in Computer Science

Bird species identification is a relevant and time-consuming task for ornithologists and ecologists. With growing amounts of audio annotated data, automatic bird classification using machine learning techniques is an important trend in... more

dim A071 111Ct, Before the feature extraction, the audio data can be further improved using multi- hreading support to speed up the pre-processing. This process starts with converting he audio channels from Stereo to Mono. Then, an envelope filter can be applied to the iudio data. This envelope basically removes parts of the audio that are below a certain hreshold and can be considered background noise. In Fig. 1 is presented the padding on . signal (of the species Black-tailed Gnatcatcher) to be used on a split with a window ength of 3 s. Half of the window length is added to the beginning and the end of the ional (1.58). The Butterworth Bandpass filter [22] is used to remove frequencies outside of the bird vocalization range. The value used for the lowest frequency cut is 1500 Hz, and 8000 Hz for the highest. While testing the pre-processing of the audio data, the Butterworth bandpass filter attenuates most of the noise, such as rain or wind, which in some cases makes a big difference both visually and audibly regarding the clarity of the bird chirps. In Fig. | we can also see the result of the Butterworth Bandpass filter applied to the signal. The final step is splitting the audio data into multiple samples. This process begins by finding the peaks in the sound. These peaks are obtained using a function from the Python library SciPy. This function has user definable parameters such as peak threshold and minimum distance between peaks. After getting a list of peaks, the audio data is then split into multiple samples, centered at the peak, and all with the same length of 3s.

Fig. 4. Extracted MFCC (left) and Mel Spectrogram (right)

Fig. 5. Convolutional Neural Network - Model Architecture can see in Fig. 5 between each convolutional and pooling layer pair, a batch normal- ization layer is added to reduce the amount of shift on the values of the hidden layers and increase the learning speed and reducing overfitting [29]. After a flatten layer, a dropout layer with a rate of 50% is added to reduce overfitting. The last layer is the fully connected layer, which is a dense layer with the softmax activation. In total, this model has 933211 trainable parameters.

Fig. 6. Long Short-Term Memory - Model Architecture

CUBVUOLULIONAL INCCUITCNUE INCULAL INCUWUIK = Wal€G INCCUITCNL UTNE The Convolutional Recurrent Neural Network - Gated Recurrent Unit (CRNN-GRU) model architecture is based on the presented CRNN-LSTM but with the LSTM layer switched to the GRU layer, with double the amount of kernel units. The second to last Dense layer also has double the amount of kernel units. In Fig. 9 we present the imple- mented CRNN-GRU model architecture. In total, this model has 15065915 trainable narameterc

descriptionView Paper arrow_downwardDownload

Moving Object Detection in Noisy Video Sequences Using Deep Convolutional Disentangled Representations

by Ezequiel López-Rubio

2024, 2022 IEEE International Conference on Image Processing (ICIP)

Noise robustness is crucial when approaching a moving detection problem since image noise is easily mistaken for movement. In order to deal with the noise, deep denoising autoencoders are commonly proposed to be applied on image patches... more

descriptionView Paper arrow_downwardDownload

Robust acoustic scene classification using a multi-spectrogram encoder-decoder framework

by Trực Nguyễn

2024, Digital Signal Processing

This article proposes an encoder-decoder network model for Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. We make use of multiple low-level spectrogram features at... more

descriptionView Paper arrow_downwardDownload

Comparison of Feature Selection via Semi supervised denoising autoencoder and traditional approaches For Software Fault-prone Classification

by latifa rabai

2024

Software quality is the capability of a software process to produce software product satisfying the end user. The quality of process or product entities is described through a set of attributes that may be internal or external. For the... more

descriptionView Paper arrow_downwardDownload

Filteraugment: An Acoustic Environmental Data Augmentation Method

by Yong-Hwa Park

2024, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Acoustic environments affect acoustic characteristics of sound to be recognized under physically interaction with sound wave propagation. Thus, training acoustic models for audio and speech tasks requires regularization on various... more

descriptionView Paper arrow_downwardDownload

Time Series Encodings with Temporal Convolutional Networks

by Wolfgang Konen

2024, Springer eBooks

The training of anomaly detection models usually requires labeled data. We present in this paper a novel approach for anomaly detection in time series which trains unsupervised using a convolutional approach coupled to an autoencoder... more

descriptionView Paper arrow_downwardDownload

Robust acoustic scene classification using a multi-spectrogram encoder-decoder framework

by tri nguyen

2024, Digital Signal Processing

This article proposes an encoder-decoder network model for Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. We make use of multiple low-level spectrogram features at... more

descriptionView Paper arrow_downwardDownload

An Ensemble Approach to Unsupervised Anomalous Sound Detection

by Abderrahim Fathan

2024

The task of anomalous sound detection (ASD) is to determine whether an observed sound is anomalous or normal. Both supervised and unsupervised approach can be adopted for the ASD task. In supervised approach anomalous and normal data are... more

descriptionView Paper arrow_downwardDownload