Academia.eduAcademia.edu

Audio Classification

description288 papers
group76 followers
lightbulbAbout this topic
Audio classification is a subfield of machine learning and signal processing that involves the automatic categorization of audio signals into predefined classes based on their features. It utilizes algorithms to analyze audio data, enabling applications such as speech recognition, music genre classification, and environmental sound identification.
lightbulbAbout this topic
Audio classification is a subfield of machine learning and signal processing that involves the automatic categorization of audio signals into predefined classes based on their features. It utilizes algorithms to analyze audio data, enabling applications such as speech recognition, music genre classification, and environmental sound identification.

Key research themes

1. How can feature extraction and dimensionality reduction improve accuracy in music genre and audio type classification?

This theme investigates the development and application of advanced feature extraction methods combined with dimensionality reduction techniques to enhance audio classification accuracy, particularly in music genre classification and speech/music discrimination. The focus lies on capturing relevant audio characteristics through timbral, spectral, and rhythmic features and optimizing their representation in reduced dimension spaces that preserve class-distinguishing information, facilitating more effective classification algorithms.

Key finding: This study introduced a nonlinear dimensionality reduction technique, Diffusion Maps, applied on timbral texture features for music genre classification. It improved classification accuracy dramatically, achieving 97%... Read more
Key finding: The research evaluated 13 distinct features related to temporal and spectral characteristics such as 4 Hz modulation energy, spectral rolloff, spectral centroid, spectral flux, and zero-crossing rate, and combined them using... Read more
Key finding: The paper proposed a hybrid classification strategy combining bagged Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs) using features like Mel-frequency cepstral coefficients (MFCCs) for four-class audio... Read more
Key finding: The study applied Support Vector Machines (SVMs) as a robust classification technique on features derived from environmental sounds including speech, door claps, and alarms in a domotic setting. Its methodological rigor in... Read more

2. What roles do binaural and spatial features play in classifying complex acoustic scenes and spatial audio recordings?

This research theme focuses on the extraction and utilization of binaural spatial cues and spectro-temporal features for the classification of spatial audio scenes recorded with binaural setups. It addresses the classification of complex environments and sound distributions around a listener, which is essential for applications in virtual reality, audio indexing, and scene analysis. The studies explore feature selection, classifier performance, and challenges related to reverberation and source ambiguity in acoustically rich settings.

Key finding: The study demonstrated that binaural cues combined with Mel-frequency cepstral coefficients (MFCCs) enable classification of different spatial audio scenes with accuracies up to 98% on binaural room impulse response (BRIR)... Read more
Key finding: By extracting over a thousand spatial and spectro-temporal features from binaural signals, the study showed the superior influence of spectro-temporal features over spatial-only metrics for classification accuracy. Using... Read more
Key finding: Though primarily focused on source separation, this work indirectly relates by enhancing the extraction of instrument-specific audio features which can be spatialized through binaural and multi-channel processing. Using... Read more

3. How are deep learning and neuromorphic approaches advancing audio event classification and bioacoustic signal recognition?

This theme examines the shift towards deep learning architectures, particularly convolutional neural networks (CNNs), and emerging neuromorphic computing techniques including spiking neural networks (SNNs) in audio event detection, environmental sound classification, and bioacoustic signal analysis. The focus lies in leveraging biologically inspired models and data-driven feature representations for improved robustness, scalability, and real-time processing capabilities across diverse audio classification tasks.

Key finding: The paper showed that treating Mel-scale filter bank features concatenated over frames as images input to CNNs led to an audio event classification accuracy of 81.5% across thirty classes including dog barks and sirens,... Read more
Key finding: This survey highlighted the growing adoption of machine learning, especially ensemble methods and CNNs, in bioacoustic and general acoustic classification. It revealed that deep learning architectures have improved... Read more
Key finding: This comprehensive survey underscored the promise of neuromorphic computing platforms based on spiking neural networks for audio classification, detailing advantages such as energy efficiency, real-time event-based... Read more
Key finding: The paper presents a practical implementation of environmental sound classification applying neural networks trained on MFCC feature sets, illustrating that convolutional and fully connected networks can effectively... Read more

All papers in Audio Classification

In the field of audio classification, audio signals may be broadly divided into three classes: speech, music and events. Most studies, however, neglect that real audio soundtracks can have any combination of these classes simultaneously.... more
In this work, we combine a Siamese neural network and different clustering techniques to generate a dissimilarity space that is then used to train an SVM for automated animal audio classification. The animal audio datasets used are (i)... more
This work addresses the recurring challenge of real-time monophonic and polyphonic audio source classification. The whole normalized power spectrum (NPS) is directly involved in the proposed process, avoiding complex and hazardous... more
In this work, we combine a Siamese neural network and different clustering techniques to generate a dissimilarity space that is then used to train an SVM for automated animal audio classification. The animal audio datasets used are (i)... more
This paper presents work on changepoint detection in musical audio signals, focusing on the case where there are note changes with low associated energy variation. Several methods are described and results of the best are presented.
A real-time pitch modification system has been developed. The implemented processing scheme is based on hybrid deterministic/stochastic decomposition of the signal and includes extraction of instantaneous pitch, pitch-synchronous... more
Audio classification is paramount in a variety of applications including surveillance, healthcare monitoring, and environmental analysis. Traditional methods frequently depend on intricate signal processing algorithms and manually crafted... more
Hearing a species in a tropical rainforest is much easier than seeing them. If someone is in the forest, he might not be able to look around and see every type of bird and frog that are there but they can be heard. A forest ranger might... more
Audio classification is a process of assigning particular class to an audio signal. Classifying the audio signal has many applications in the field of digital library, automatic organization of databases etc. In the last several years... more
This work is devoted to the problem of automatic speech and music discrimination. As we will see here, speech and music signals have quite distinctive features. However, the efficient distinction between speech and music is still an open... more
Speech Signals are the primary source of direct transmitter-to-receiver human communication and falls in the category of acoustic signals. These signals are the mechanical waves represented in terms of analog signal and propagate as... more
The sequence of strings played on a bowed string instrument is essential to understanding of the fingering. Thus, its estimation is required for machine understanding of violin playing. Audio-based identification is the only viable way to... more
In the age of digital information, audio data has become an important part in many modern computer applications. Audio classification has been becoming a focus in the research of audio processing and pattern recognition. Automatic audio... more
It is both challenging and desirable to be able to retrieve sound files relevant to users' interests by searching the Internet. Unlike the traditional way of using keywords as input to search for web pages with relevant texts, query... more
Speech Emotion Recognition (SER) is a method where computers learn to recognize human emotions from speech to improve communication. In this study, we present an innovative Bangla SER framework, incorporating data augmentations, feature... more
In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. However, to our knowledge, these deep learning approaches have not been extensively studied... more
Many different short-time features, using time windows in the size of 10-30 ms, have been proposed for music segmentation, retrieval and genre classification. However, often the available time frame of the music to make the actual... more
This paper presents a method to classify audio-video data into one of five classes: advertisement, cartoon, news, movie and songs. Automatic audio-video classification is very useful to audio-video indexing, content based audio-video... more
To assist clinicians in the differential diagnosis and treatment of motor speech disorders, it is imperative to establish objective tools which can reliably characterize different subtypes of disorders such as apraxia of speech (AoS) and... more
In this paper, we describe an efficient method for audio matching which performs effectively for a wide range of classical music. The basic goal of audio matching can be described as follows: consider an audio database containing several... more
Modern streaming services are increasingly labeling videos based on their visual or audio content. This typically augments the use of technologies such as AI and ML by allowing to use natural speech for searching by keywords and video... more
Modern streaming services are increasingly labeling videos based on their visual or audio content. This typically augments the use of technologies such as AI and ML by allowing to use natural speech for searching by keywords and video... more
The version in the Kent Academic Repository may differ from the final published version. Users are advised to check http://kar.kent.ac.uk for the status of the paper. Users should always cite the published version of record.
Automatic discrimination of speech and music is an important tool in many multimedia applications. Previous work has focused on using long-term features such as differential parameters, variances, and time-averages of spectral parameters.... more
In recent years, deep learning approaches have gained significant interest as a way of building hierarchical representations from unlabeled data. However, to our knowledge, these deep learning approaches have not been extensively studied... more
Speech and Audio processing techniques are used along with statistical pattern recognition principles to solve the problem of music instrument recognition. Non temporal, frame level features only are used so that the proposed system is... more
This report describes the use of a support vector machines with a novel kernel, to determine the breathing rate and inhalation duration of a fire fighter wearing a Self-Contained Breathing Apparatus. With this information, an incident... more
Les services audio de la nouvelle génération requièrent des outils d'édition des fichiers audio qui permettent de traiter un tel fichier aussi simplement qu'un fichier texte. L'indexation des fichiers est une solution permettant l'accès... more
Script identification is challenging task in bilingual or multilingual optical character recognition system. A remarkable research work on script identification have been noted in Indian or non-Indian context. As many commercial and... more
Contemporary audio declipping algorithms often ignore the possibility of the presence of additive channel noise. If and when noise is present, however, the efficacy of any declipping algorithm is critically dependent on the accuracy with... more
The version in the Kent Academic Repository may differ from the final published version. Users are advised to check http://kar.kent.ac.uk for the status of the paper. Users should always cite the published version of record.
In this paper we present a study on music mood classification using audio and lyrics information. The mood of a song is expressed by means of musical features but a relevant part also seems to be conveyed by the lyrics. We evaluate each... more
Due to age-bound onset of symptoms used for diagnosis of mild to moderate intellectual disability, early diagnosis of these problems has long been a difficult issue. The diagnosis includes tests pertaining to intellectual functioning and... more
We study auditory context recognition for contextaware mobile computing systems. Auditory contexts are recordings of a mixture of sounds, or ambient audio, from mobile users' everyday environments. For training a classifier, a set of... more
Due to the progress of the unlimited data storage capabilities and the proliferation use of the Internet, information retrieval systems encountered a large interest. Much of this data is in different forms from various sources. So, it... more
In this paper, starting from a robust statistics (RS) adaptive approach presented in a previous work entitled the combined NLMS-Sign (CNLMS-S) adaptive filter, an automatic combination technique with similar performances is proposed.... more
Presently, fast proliferation of information enforces novel challenges on content management. Further, computerized audio classification along-with content description is considered as valuable method to manage audio contents. In general,... more
Many audio and multimedia applications would benefit if they could interpret the content of audio rather than relying on descriptions or keywords. These applications include multimedia databases and file systems, digital libraries,... more
Musical signals exhibit periodic temporal structure that create the sensation of rhythm. In order to model, analyze, and retrieve musical signals it is important to automatically extract rhythmic information. To somewhat simplify the... more
The barkhausen noise carries important information which can be used in early damage detection and fault diagnosis. The barkhausen noise is corrupted by interference signals from other sources during the measure and the information of... more
Acoustic monitoring is crucial for the conservation and management of ecosystems and their flora and fauna. Traditional animal monitoring is based on passive acoustic recordings that are manually assessed by human experts, which adds a... more
Deep learning has celebrated resounding successes in many application areas of relevance to the Internet of Things (IoT), such as computer vision and machine listening. These technologies must ultimately be brought directly to the edge to... more
Significant effort s are being invested to bring state-of-the-art classification and recognition to edge devices with extreme resource constraints (memory, speed, and lack of GPU support). Here, we demonstrate the first deep network for... more
A hybrid speech/non-speech detector is proposed for the pre-processing of broadcast news. During the first stage speech/non-speech classification of uniform overlapping segments is performed. The accuracy in the detection of boundaries is... more
In this work, we combine a Siamese neural network and different clustering techniques to generate a dissimilarity space that is then used to train an SVM for automated animal audio classification. The animal audio datasets used are (i)... more
In this paper, a novel collective network of binary classifiers (CNBC) framework is presented for content-based audio classification. The topic has been studied in several publications before, but in many cases the number of different... more
The recently proposed Progressive Query method is a dynamic retrieval technique, which is mainly designed to bring an effective solution especially for queries on large-scale multimedia databases and furthermore to provide periodic... more
Support vector machines (SVMs) have been recently proposed as a new learning algorithm for pattern recognition. In this paper, the SVMs with a binary tree recognition strategy are used to tackle the audio classification problem. We... more
This abstract describes the audio feature extraction and classification algorithm used for the University of Victoria submission to the MIREX (Music Information Retrieval Exchange) 2005. The same audio features and classification... more
Download research papers for free!