Academia.eduAcademia.edu

Audio Classification

description288 papers
group76 followers
lightbulbAbout this topic
Audio classification is a subfield of machine learning and signal processing that involves the automatic categorization of audio signals into predefined classes based on their features. It utilizes algorithms to analyze audio data, enabling applications such as speech recognition, music genre classification, and environmental sound identification.
lightbulbAbout this topic
Audio classification is a subfield of machine learning and signal processing that involves the automatic categorization of audio signals into predefined classes based on their features. It utilizes algorithms to analyze audio data, enabling applications such as speech recognition, music genre classification, and environmental sound identification.

Key research themes

1. How can feature extraction and dimensionality reduction improve accuracy in music genre and audio type classification?

This theme investigates the development and application of advanced feature extraction methods combined with dimensionality reduction techniques to enhance audio classification accuracy, particularly in music genre classification and speech/music discrimination. The focus lies on capturing relevant audio characteristics through timbral, spectral, and rhythmic features and optimizing their representation in reduced dimension spaces that preserve class-distinguishing information, facilitating more effective classification algorithms.

Key finding: This study introduced a nonlinear dimensionality reduction technique, Diffusion Maps, applied on timbral texture features for music genre classification. It improved classification accuracy dramatically, achieving 97%... Read more
Key finding: The research evaluated 13 distinct features related to temporal and spectral characteristics such as 4 Hz modulation energy, spectral rolloff, spectral centroid, spectral flux, and zero-crossing rate, and combined them using... Read more
Key finding: The paper proposed a hybrid classification strategy combining bagged Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs) using features like Mel-frequency cepstral coefficients (MFCCs) for four-class audio... Read more
Key finding: The study applied Support Vector Machines (SVMs) as a robust classification technique on features derived from environmental sounds including speech, door claps, and alarms in a domotic setting. Its methodological rigor in... Read more

2. What roles do binaural and spatial features play in classifying complex acoustic scenes and spatial audio recordings?

This research theme focuses on the extraction and utilization of binaural spatial cues and spectro-temporal features for the classification of spatial audio scenes recorded with binaural setups. It addresses the classification of complex environments and sound distributions around a listener, which is essential for applications in virtual reality, audio indexing, and scene analysis. The studies explore feature selection, classifier performance, and challenges related to reverberation and source ambiguity in acoustically rich settings.

Key finding: The study demonstrated that binaural cues combined with Mel-frequency cepstral coefficients (MFCCs) enable classification of different spatial audio scenes with accuracies up to 98% on binaural room impulse response (BRIR)... Read more
Key finding: By extracting over a thousand spatial and spectro-temporal features from binaural signals, the study showed the superior influence of spectro-temporal features over spatial-only metrics for classification accuracy. Using... Read more
Key finding: Though primarily focused on source separation, this work indirectly relates by enhancing the extraction of instrument-specific audio features which can be spatialized through binaural and multi-channel processing. Using... Read more

3. How are deep learning and neuromorphic approaches advancing audio event classification and bioacoustic signal recognition?

This theme examines the shift towards deep learning architectures, particularly convolutional neural networks (CNNs), and emerging neuromorphic computing techniques including spiking neural networks (SNNs) in audio event detection, environmental sound classification, and bioacoustic signal analysis. The focus lies in leveraging biologically inspired models and data-driven feature representations for improved robustness, scalability, and real-time processing capabilities across diverse audio classification tasks.

Key finding: The paper showed that treating Mel-scale filter bank features concatenated over frames as images input to CNNs led to an audio event classification accuracy of 81.5% across thirty classes including dog barks and sirens,... Read more
Key finding: This survey highlighted the growing adoption of machine learning, especially ensemble methods and CNNs, in bioacoustic and general acoustic classification. It revealed that deep learning architectures have improved... Read more
Key finding: This comprehensive survey underscored the promise of neuromorphic computing platforms based on spiking neural networks for audio classification, detailing advantages such as energy efficiency, real-time event-based... Read more
Key finding: The paper presents a practical implementation of environmental sound classification applying neural networks trained on MFCC feature sets, illustrating that convolutional and fully connected networks can effectively... Read more

All papers in Audio Classification

In this paper, we address the problem of classi®cation of continuous general audio data (GAD) for content-based retrieval, and describe a scheme that is able to classify audio segments into seven categories consisting of silence, single... more
We describe a content-based audio classification algorithm based on novel multiscale spectro-temporal modulation features inspired by a model of auditory cortical processing. The task explored is to discriminate speech from nonspeech... more
In this paper we present a study on music mood classification using audio and lyrics information. The mood of a song is expressed by means of musical features but a relevant part also seems to be conveyed by the lyrics. We evaluate each... more
We propose a time series analysis based approach for systematic choice of audio classes for detection of crimes in elevators in 1. Since all the different sounds in a surveillance environment cannot be anticipated, a surveillance system... more
In the age of digital information, audio data has become an important part in many modern computer applications. Audio classification has been becoming a focus in the research of audio processing and pattern recognition. Automatic audio... more
Nowadays, it appears essential to design automatic indexing tools which provide meaningful and efficient means to describe the musical audio content. There is in fact a growing interest for music information retrieval (MIR) applications... more
In the context of content-based multimedia indexing gender identification using speech signal is an important task. Existing techniques are dependent on the quality of the speech signal making them unsuitable for the video indexing... more
In this paper, music genre classification is addressed in a multilinear perspective. Inspired by a model of auditory cortical processing, multiscale spectro-temporal modulation features are extracted. Such spectro-temporal modulation... more
During a music performance, the musician adds expressiveness to the musical message by changing timing, dynamics, and timbre of the musical events to communicate an expressive intention. Traditionally, the analysis of music expression is... more
The paper deals with the design of a sound recognition system focused on an ultra low power hardware implementation in a button like miniature form factor. We present the results of the first design phase focused on selection and... more
Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents a robust and effective approach for speech/music discrimination, which relies on a set of features derived from... more
We present a comparison of 6 methods for classification of sports audio. For the feature extraction we have two choices: MPEG-7 audio features and Mel-scale Frequency Cepstrum Coefficients (MFCC). For the classificaiton we also have two... more
Audio classification typically involves feeding a fixed set of low-level features to a machine learning method, then performing feature aggregation before or after learning. Instead, we jointly learn a selection and hierarchical temporal... more
Most of speech/music discrimination techniques proposed in the literature need a great amount of training data in order to provide acceptable results. Besides, they are usually context-dependent. In this paper, we propose a novel... more
Nonlinear distortions pose a serious problem for the quality preservation of audio and speech signals. To address this problem, such signals are processed by nonlinear models. Functional link adaptive filter (FLAF) is a... more
This paper presents the results of the application of a feature selection procedure to an automatic music genre classification system. The classification system is based on the use of multiple feature vectors and an ensemble approach,... more
In audio content analysis, the discrimination of speech and non-speech is the first processing step before speaker segmentation and recognition, or speech transcription. Speech/non-speech segmentation algorithms usually consist of a frame... more
Automatic media content analysis and understanding for efficient topic searching and browsing are current challenges in the management of e-learning content repositories. This paper presents our current work on analyzing and... more
With the rapid growth in audio data volume, research in the area of content-based audio retrieval has gained impetus in the last decade. Audio classification serves as the fundamental step towards it. Accuracy in classifying data relies... more
by Mirco Ravanelli and 
1 more
Audio-based multimedia retrieval tasks may identify semantic information in audio streams, i.e., audio concepts (such as music, laughter , or a revving engine). Conventional Gaussian-Mixture-Models have had some success in classifying a... more
The importance of automatic discrimination between speech signals and music signals has evolved as a research topic over recent years. The need to classify audio into categories such as speech or music is an important aspect of many... more
Audio classification is one of the most important task in content-based analysis and can be implemented in many audio applications, such as indexing and retrieving. This paper addresses the problem of broadcast news audio classification,... more
Multimedia Event Detection (MED) aims to identify events—also called scenes—in videos, such as a flash mob or a wedding ceremony. Audio content information complements cues such as visual content and text. In this paper, we explore the... more
We discuss the meaning and significance of the video mining problem, and present our work on some aspects of video mining. A simple definition of video mining is unsupervised discovery of patterns in audio-visual content. Such purely... more
In this paper we describe new methods to detect semantic concepts from digital video based on audible and visual content. Temporal Gradient Correlogram captures temporal correlations of gradient edge directions from sampled shot frames.... more
This paper presents a system designed for the management of multimedia databases that embarks upon the problem of efficient media processing and representation for automatic semantic classification and modelling. Its objectives are... more
The automatic classification of musical genre from audio signals has been a topic of active research in recent years. Although the identification of genre is a subjective task that likely involves high-level musical attributes such as... more
Specific sounds such as applause, laugh, music, environmental noise, etc. are very helpful to understand high level semantic of the multimedia content. The detection of such key sounds is one of the challenges in intelligent management of... more
Acoustic events are a rich source of information for contextawareness and support various application areas, such as audio surveillance [1], sound sensing [2], intelligent auditory interfaces [3] and speech localization . Acoustic... more
A preprocessing stage in every audio application including music/speech separation, speech or speaker recognition and audio transcription task is inevitable to determine each frame belongs to which classes, namely: speech only, music only... more
Automatic audio classification usually considers sounds as music, speech, silence or noise, but works about the noise class are rare. Audio features are generally specific to speech or music signals. In this paper, we present a new audio... more
Audio classification serves as the fundamental step towards application like content based audio retrieval. In this work, we have tried to exploit the inherent difference in the composition of speech and music signal. A music signal has... more
Automatically extracting semantic content from audio streams can be helpful in many multimedia applications. In this paper, we introduce a framework for automatic feature subspace selection from a common feature vector. The selected... more
The automatic identification of musical instrument timbres occurring in a recording of music has many applications, including music search by timbre, music recommender systems and transcribers. A major difficulty is that most music is... more
In this paper, a new approach for automatic audio classification using non-negative matrix factorization (NMF) is presented. Training is performed onto each audio class individually, whilst during the test phase each test recording is... more
There is an increasing need for automatically classifying sounds for MIR and interactive music applications. In the context of supervised classification, we describe an approach that improves the performance of the general bag-of-frame... more
In this paper we study the efficiency of support vector machines (SVM) with alignment kernels in audio classification. The classification task chosen is music instrument recognition. The alignment kernels have the advantage of handling... more
In this paper we report the experimental results obtained when applying a mixture of experts to the problem of audio classification for multimedia applications. The mixture of experts is based on neural networks as individual experts and... more
In this paper we describe a system we have developed for automatic broadcast-quality video indexing that successfully combines results from the fields of speaker verification, acoustic analysis, very large vocabulary speech recognition,... more
There is an increasing need for automatically classifying sounds for MIR and interactive music applications. In the context of supervised classification, we conducted experiments with so-called analytical features, an approach that... more
Currently, robotic systems employ almost exclusively global sensor information for navigation purposes. While a global map facilitates planning, it may have insufficient quality. Especially with autonomous robots, additional information... more
In this paper we present an algorithm for automatic classification of sound into speech, instrumental sound/ music and silence. The method is based on thresholding of features derived from the modulation envelope of the frequency limited... more
A method for content-based audio classification is presented. In particular we focus on identification of musical instruments sounds based on timbre classification, using a biologicalty plausible features extraction technique called... more
Rapid advancement in computers and internet technology has led large volume of multimedia files. The archiving and digitization of the old media contents also contributes to the growth of the digital library. The usefulness of these... more
In this paper, classification of audio sources is presented to supplement current work on existing system for localization of audio sources. The question of achieving the audio classification lies in the convenient discrimination of the... more
J-DSP is a java-based object-oriented online programming environment developed at Arizona State University for education and research. This paper presents a collection of interactive Java modules for the purpose of introducing... more
Download research papers for free!