speech classification

description11 papers

group0 followers

lightbulbAbout this topic

Speech classification is the process of categorizing spoken language into predefined classes based on its acoustic features, linguistic content, or speaker characteristics. This field encompasses various techniques from signal processing, machine learning, and linguistics to analyze and interpret speech for applications such as automatic speech recognition and speaker identification.

lightbulbAbout this topic

Key research themes

1. How can machine learning models improve spoken language and speaker classification accuracy under resource constraints and domain variability?

This research area focuses on leveraging machine learning (ML) techniques, including supervised and unsupervised learning, factorized convolutional neural networks, domain adversarial training, and self-supervised learning, to enhance the classification of speech units, spoken languages, voice disorders, and speakers. It addresses challenges related to limited training data, domain mismatch between clinical and real-world data, computational constraints for embedded systems, and variability across speakers and recording conditions. Developing robust, compact, and domain-invariant features is vital to deploying accurate classification systems in practical applications.

Investigation of Spoken-Language Detection and Classification in Broadcasted Audio Content

by George Kalliris

2021, Information

Key finding: This paper presents a hierarchical discrimination scheme combining supervised and unsupervised machine learning to classify spoken languages in realistic broadcast radio content with multilingual interference and limited... Read more

articleView Paper downloadDownload

Toward Real-World Voice Disorder Classification

by Yu Tsao

2023, IEEE Transactions on Biomedical Engineering

Key finding: The study deploys factorized convolutional neural networks with domain adversarial training to derive domain-invariant features that overcome domain mismatch between clean clinical recordings and noisy real-world voice data.... Read more

articleView Paper downloadDownload

Class Specific GMM Based Sparse Feature for Speech Units Classification

by Anil Sao

2024, Zenodo (CERN European Organization for Nuclear Research)

Key finding: This work proposes a novel approach using Gaussian Mixture Models (GMM) to learn class-specific multiple dictionaries for sparse feature extraction, improving speech unit classification across various languages and tasks.... Read more

articleView Paper downloadDownload

Hierarchical Multi-Class Classification of Voice Disorders Using Self-Supervised Models and Glottal Features

by Sudarsana Kadiri

2023, IEEE Open Journal of Signal Processing

Key finding: By combining glottal source estimation with features from pre-trained self-supervised models (wav2vec 2.0 and HuBERT) and employing hierarchical multi-class classifiers, this study achieves marked improvements in three-class... Read more

articleView Paper downloadDownload

Towards lifelong human assisted speaker diarization

by Meysam Shamsi and

2023, Computer Speech & Language

Key finding: This paper introduces a lifelong learning speaker diarization framework that incrementally adapts to dataset shift and discontinuous audio streams by incorporating human-in-the-loop active correction processes. Using the... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What acoustic and feature extraction techniques best support speech unit and emotion classification despite speech variability and noise?

This theme explores advanced feature extraction methodologies—such as Mel-frequency cepstral coefficients (MFCC), wavelet packet subband analysis, spectral centroid irregularities, and cepstral-based representations—for robust classification of speech units, speech under stress, and emotions. It investigates how these features capture nuanced spectral-temporal dynamics and energy distribution in speech that are crucial for differentiating phonemes, stressed speech types, or emotional states, particularly in noisy or real-world environments where variability is high.

Subband based classification of speech under stress

by John Gowdy

2024, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181)

Key finding: The study introduces novel subband features derived from wavelet packet analysis, including Scale Energy, Autocorrelation-Scale-Energy and their cepstral counterparts, which outperform classical MFCC and autocorrelation-Mel... Read more

articleView Paper downloadDownload

Classifying emotions in human-machine spoken dialogs

by Roberto Pieraccini

2025, Proceedings. IEEE International Conference on Multimedia and Expo

Key finding: Utilizing utterance-level pitch and energy statistics, the paper compares three classifiers—linear discriminant, k-nearest neighbor, and support vector machine—for emotion recognition in real human-machine telephone dialogs.... Read more

articleView Paper downloadDownload

Interdependent nucleocytoplasmic trafficking and interactions of Dis3 with Rrp6, the core exosome and importin-alpha3

by Erik Andrulis

2012

Key finding: A speech recognition system for Malayalam digits employs Discrete Wavelet Transforms (DWT) for feature extraction combined with wavelet denoising for noise suppression, comparing Artificial Neural Networks (ANN), Support... Read more

articleView Paper

An Expert System for Automatic Classiﬁcation of Sound Signals

by Krzysztof Tyburek

2024, Journal of Telecommunications and Information Technology

Key finding: This research develops a hybrid expert system leveraging neural networks and fuzzy classifiers to classify audio signals, including animal (bird) species based on their sounds. Using low-level MPEG-7 descriptors and neural... Read more

articleView Paper downloadDownload

Phonetic Analysis of GMM-based Speaker Models

by Margit Antal

2023, Proceedings of “Verificatori Biometrici” Workshop, organized by Technical University of Cluj-Napoca, Universitas Napocensis Babes-Bolyai, Universitas Medicinae et Farmaciae Napocensis and CNCSIS, Cluj-Napoca, Romania, May

Key finding: By analyzing phoneme distributions among Gaussian components of speaker models, this study reveals that specific phonetic content contributes differentially to speaker recognition. The findings suggest that selecting speech... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can automatic classification support medical diagnosis and content management through speech analysis?

This research theme studies the application of automatic speech classification for medical diagnosis of neurodegenerative diseases and voice disorders, and for classification of audio content like call routing or speaker diarization in large datasets. It emphasizes extracting linguistic and acoustic markers from spontaneous speech or pathological voices for early detection of diseases like Alzheimer's and voice pathology, as well as efficient indexing and improved management of large-scale speech or broadcast audio by classifying speaker turns, languages, or call reasons.

ML-Based Analysis to Identify Speech Features Relevant in Predicting Alzheimer's Disease

by Yash Kumar

2022

Key finding: Using DementiaBank's Pitt Corpus transcripts, the study trains machine learning and neural networks to classify Alzheimer's disease (AD) from normal aging and other neurodegenerative diseases based on linguistic... Read more

articleView Paper downloadDownload

An automatic Alzheimer’s disease classifier based on spontaneous spoken English

by Flavio Bertini

2021, Computer Speech & Language

Key finding: The authors propose a fully automated neurological disease classifier that processes raw speech spectrograms with a data-augmentation-supported artificial neural network, achieving 93.3% accuracy and 88.5% F1-score on the... Read more

articleView Paper downloadDownload

Automatic Classification of Disordered Voices with Hidden Markov Models

by Redouane Benhammoud

2020, 2018 International Conference on Signal, Image, Vision and their Applications (SIVA), Guelma, Algeria, IEEE

Key finding: This paper demonstrates the use of Hidden Markov Models with mel-frequency cepstral coefficients as acoustic features to classify multiple types of vocal pathologies from Dutch speech data. The approach extends prior binary... Read more

articleView Paper downloadDownload

Call Classification with Hundreds of Classes and Hundred Thousands of Training Utterances ... ... and No Target Domain Data

by Roberto Pieraccini

2025, Lecture Notes in Computer Science

Key finding: To handle large-scale call reason classification with 250 target classes and no annotated target domain data, this study exploits over 300,000 annotated utterances from related domains coupled with iterative re-annotation,... Read more

articleView Paper downloadDownload

Towards lifelong human assisted speaker diarization

by Meysam Shamsi and

2023, Computer Speech & Language

Key finding: The ALLIES corpus and associated protocols support human-assisted lifelong learning in speaker diarization over large diverse audio collections. The introduced system incrementally processes shows, links speakers cross-show,... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in speech classification

Data Collection from Persons with Mild Forms of Cognitive Impairment and Healthy Controls - Infrastructure for Classification and Prediction of Dementia

by Eva Björkner

2023

Cognitive and mental deterioration, such as difficulties with memory and language, are some of the typical phenotypes for most neurodegenerative diseases including Alzheimer's disease and other dementia forms. This paper describes the... more

descriptionView Paper arrow_downwardDownload

Interpretability Analysis of Deep Models for COVID-19 Detection

by Sandra Maria Aluísio

2023, arXiv (Cornell University)

During the outbreak of COVID-19 pandemic, several research areas joined efforts to mitigate the damages caused by SARS-CoV-2. In this paper we present an interpretability analysis of a convolutional neural network based model for COVID-19... more

descriptionView Paper arrow_downwardDownload

Neural Architecture Search with Multimodal Fusion Methods for Diagnosing Dementia

by Dimitris Askounis

2023

Alzheimer's dementia (AD) affects memory, thinking, and language, deteriorating person's life. An early diagnosis is very important as it enables the person to receive medical help and ensure quality of life. Therefore, leveraging... more

descriptionView Paper arrow_downwardDownload

Speech recognition based on spectrograms by using deep learning

by ROY LEON

2023

Speech Recognition is widely being used and it has become part of our day to day. Several massive and popular applications have taken its use to another level. Most of the existing systems use machine learning techniques such as... more

descriptionView Paper arrow_downwardDownload

Improving Detection of Alzheimer’s Disease Using Automatic Speech Recognition to Identify High-Quality Segments for More Robust Feature Extraction

by Bahman Mirheidari

2023, Interspeech 2020

Speech and language based automatic dementia detection is of interest due to it being non-invasive, low-cost and potentially able to aid diagnosis accuracy. The collected data are mostly audio recordings of spoken language and these can... more

descriptionView Paper arrow_downwardDownload

Predicting Early Indicators of Cognitive Decline from Verbal Utterances

by Krishnaprasad Thirunarayan

2023, 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Dementia is a group of irreversible, chronic, and progressive neurodegenerative disorders resulting in impaired memory, communication, and thought processes. In recent years, clinical research advances in brain aging have focused on the... more

descriptionView Paper arrow_downwardDownload

Predicting Mini-Mental Status Examination Scores through Paralinguistic Acoustic Features of Spontaneous Speech

by Fasih Haider

2023, 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

Speech analysis could provide an indicator of cognitive health and help develop clinical tools for automatically detecting and monitoring cognitive health progression. The Mini Mental Status Examination (MMSE) is the most widely used... more

descriptionView Paper arrow_downwardDownload

Continuous Bangla Speech Segmentation, Classification and Feature Extraction

by Al-Amin Bhuiyan

2023

Continuous speech recognition is a multileveled pattern recognition task, which includes speech segmentation, classification, feature extraction and pattern recognition. In our work, a blind speech segmentation procedure was used to... more

descriptionView Paper arrow_downwardDownload

Predicting Early Indicators of Cognitive Decline from Verbal Utterances

by William Romine

2023, 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

descriptionView Paper arrow_downwardDownload

Automatic speech analysis for the assessment of patients with predementia and Alzheimer's disease

by Alexandra Konig

2023, Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring

Background: To evaluate the interest of using automatic speech analyses for the assessment of mild cognitive impairment (MCI) and early-stage Alzheimer's disease (AD). Methods: Healthy elderly control (HC) subjects and patients with MCI... more

descriptionView Paper arrow_downwardDownload

A Speech Recognition-based Solution for the Automatic Detection of Mild Cognitive Impairment from Spontaneous Speech

by Zoltan Banreti

2023, Current Alzheimer research

Even today the reliable diagnosis of the prodromal stages of Alzheimer's disease (AD) remains a great challenge. Our research focuses on the earliest detectable indicators of cognitive decline in mild cognitive impairment (MCI). Since... more

descriptionView Paper arrow_downwardDownload

Intelligent Audio Signal Processing for Detecting Rainforest Species Using Deep Learning

by Dr. Meenu Gupta

2023, Intelligent Automation & Soft Computing

Hearing a species in a tropical rainforest is much easier than seeing them. If someone is in the forest, he might not be able to look around and see every type of bird and frog that are there but they can be heard. A forest ranger might... more

descriptionView Paper arrow_downwardDownload

Intelligent Audio Signal Processing for Detecting Rainforest Species Using Deep Learning

by Er.Tushar Aggarwal

2023, Intelligent Automation & Soft Computing

descriptionView Paper arrow_downwardDownload

A Cross-language Dementia Classifier: a Preliminary Study

by Flavio Bertini and

2022

Life expectancy increased globally, and the increasing prevalence of age-related issues is a major societal challenge. In particular, the World Health Organisation estimates that people suffering from dementia worldwide will grow up to... more

Fig. 1. Deep learning model architecture.

CHARACTERISTICS OF THE ITALIAN LANGUAGE COHORT OF SUBJECTS.

CHARACTERISTICS OF THE ENGLISH LANGUAGE COHORT OF SUBJECTS.

descriptionView Paper arrow_downwardDownload

Exploiting Spectral Augmentation for Code-Switched Spoken Language Identification

by pradeep rangan

2022, Cornell University - arXiv

Spoken language Identification (LID) systems are needed to identify the language(s) present in a given audio sample, and typically could be the first step in many speech processing related tasks such as automatic speech recognition (ASR).... more

descriptionView Paper arrow_downwardDownload

Automatic identification of Mild Cognitive Impairment through the analysis of Italian spontaneous speech productions

by Fabio Tamburini

2022

This paper presents some preliminary results of the OPLON project. It aimed at identifying early linguistic symptoms of cognitive decline in the elderly. This pilot study was conducted on a corpus composed of spontaneous speech sample... more

descriptionView Paper arrow_downwardDownload

Classification margin for improved class-based speech recognition performance

by Denis Jouvet

2022, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper investigates class-based speech recognition, and more precisely the impact of the selection of the training samples for each class on the final speech recognition performance. Increasing the number of recognition classes should... more

descriptionView Paper arrow_downwardDownload

Cognitive impairment prediction in the elderly based on vocal biomarkers

by James Mundt

2022, Interspeech 2015

Remote, automated cognitive impairment (CI) diagnosis has the potential to facilitate care for the elderly. Speech is easily collected over the phone and already some common cognitive tests are administered remotely, resulting in regular... more

Patients with CI (red) are not perfectly separated from the normal patients (blue) for any two features. However, for most feature combinations the CI observations span a subset of the distribution from normal observations. The SVM learns a discrimination boundary in the space defined by all six features to distinguish the normal from the CI class. In comparison, an alternative process of filtering features independently and applying a t-test to select those with p- values less than 0.05, described in [10], would eliminate three of the six features, resulting in lower discrimination performance. Figure 1. 2D scatter plots of the six features for normal (blue) and CI (red) observations. Feature I is total duration of ‘s’ phoneme (EB), feature 2 is pseudo-syllable rate (AF), feature 3 is average pause duration (AF), feature 4 is total count of ‘m’ phoneme, feature 5 is pitch variance (EB), and feature 6 is the eighth cross correlation feature (EB). For all of the pairwise plots, CI observations cluster in localized areas of the distribution for normal observations.

Figure 2. ROC plots showing variability in SVM classifier performance from 200 iterations of 10-fold cross validation. Each iteration uses the six speech features from the entire dataset. EER estimates fall on the y = 1 — x line. The ROCs bounding upper and lower EER for the set are blue dashed lines. The average ROC is in bold red and the average ROC EER is marked with a black dot.

Figure 3. Histogram of participant ages in our sample. CI patients do not cluster in higher age ranges for this sample and are distributed similarly to the normal patients.

descriptionView Paper arrow_downwardDownload

Non-Invasive Classification of Alzheimer's Disease Using Eye Tracking and Language

by Sally Newton-Mason

2022

Alzheimer’s disease (AD) is an insidious progressive neurodegenerative disease resulting in impaired cognition, dementia, and eventual death. At the earliest stages of the disease, decline in multiple cognitive domains including speech... more

descriptionView Paper arrow_downwardDownload

Non-Invasive Classification of Alzheimer's Disease Using Eye Tracking and Language

by Sheetal Shajan

2022

descriptionView Paper arrow_downwardDownload

Non-Invasive Classification of Alzheimer's Disease Using Eye Tracking and Language

by Cristina Conati

2022

descriptionView Paper arrow_downwardDownload

Voice pathology detection based on the modified voice contour and SVM

by Mohamed Farahat

2022, Biologically Inspired Cognitive Architectures

In this study, a novel method based on the voice intensity of a speech signal is used for automatic pathology detection with continuous speech. The proposed method determines the peaks from the speech signal to form a voice contour. The... more

descriptionView Paper arrow_downwardDownload

MuLER: Multiplet-Loss for Emotion Recognition

by mounir Zrigui

2022, Proceedings of the 2022 International Conference on Multimedia Retrieval

With the rise of human-machine interactions, it has become necessary for machines to better understand humans in order to respond appropriately. Hence, in order to increase communication and interaction, it would be ideal for machines to... more

descriptionView Paper arrow_downwardDownload

Predicting Early Indicators of Cognitive Decline from Verbal Utterances

by Tanvi Banerjee

2022, 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

descriptionView Paper arrow_downwardDownload

Continuous Bangla Speech Segmentation, Classification and Feature Extraction

by Al-Amin Bhuiyan

2022

descriptionView Paper arrow_downwardDownload

Transformer-based deep neural network language models for Alzheimer’s disease detection from targeted speech

by Alireza Roshanzamir

2022

Background: We developed transformer-based deep learning models based on natural language processing for early diagnosis of Alzheimer’s disease from the picture description test.Methods: The lack of large datasets poses the most important... more

descriptionView Paper arrow_downwardDownload

A Comparative Study of Acoustic and Linguistic Features Classification for Alzheimer's Disease Detection

by Brian Mak

2022, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

With the global population ageing rapidly, Alzheimer's disease (AD) is particularly prominent in older adults, which has an insidious onset followed by gradual, irreversible deterioration in cognitive domains (memory, communication, etc).... more

descriptionView Paper arrow_downwardDownload

Temporal Integration of Text Transcripts and Acoustic Features for Alzheimer's Diagnosis Based on Spontaneous Speech

by Fasih Haider

2022, Frontiers in Aging Neuroscience

Background: Advances in machine learning (ML) technology have opened new avenues for detection and monitoring of cognitive decline. In this study, a multimodal approach to Alzheimer's dementia detection based on the patient's... more

descriptionView Paper arrow_downwardDownload

Continuous Bangla Speech Segmentation, Classification and Feature Extraction

by Md. Mijanur Rahman

2022

descriptionView Paper arrow_downwardDownload

Predicting Early Indicators of Cognitive Decline from Verbal Utterances

by Swati Padhee

2022, 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

descriptionView Paper arrow_downwardDownload

Continuous Bangla Speech Segmentation, Classification and Feature Extraction

by Mijanur Rahman

2022

descriptionView Paper arrow_downwardDownload

Alzheimer’s Dementia Detection Using Acoustic Linguistic Features and Pre-trained BERT

by Veeky Baths

2022, 2021 8th International Conference on Soft Computing & Machine Intelligence (ISCMI)

Alzheimer's disease is a fatal progressive brain disorder that worsens with time. It is high time we have inexpensive and quick clinical diagnostic techniques for early detection and care. In previous studies, various Machine Learning... more

descriptionView Paper arrow_downwardDownload

An Interpretable Deep Learning Model for Automatic Sound Classification

by Martín Rocamora

2022, Electronics

Deep learning models have improved cutting-edge technologies in many research areas, but their black-box structure makes it difficult to understand their inner workings and the rationale behind their predictions. This may lead to... more

descriptionView Paper arrow_downwardDownload

AUDD: Audio Urdu Digits Dataset for Automatic Audio Urdu Digit Recognition

by Aisha Chandio

2022

The ongoing development of audio datasets for numerous languages has spurred research activities towards designing smart speech recognition systems. A typical speech recognition system can be applied in many emerging applications, such as... more

descriptionView Paper arrow_downwardDownload

Predicting Early Indicators of Cognitive Decline from Verbal Utterances

by William Romine

2022, 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

descriptionView Paper arrow_downwardDownload

Transformer-based deep neural network language models for Alzheimer’s disease risk assessment from targeted speech

by Alireza Roshanzamir

2021, BMC Medical Informatics and Decision Making

Background We developed transformer-based deep learning models based on natural language processing for early risk assessment of Alzheimer’s disease from the picture description test. Methods The lack of large datasets poses the most... more

descriptionView Paper arrow_downwardDownload

An automatic Alzheimer’s disease classifier based on spontaneous spoken English

by Flavio Bertini

2021, Computer Speech & Language

According to the World Health Organization, the number of people suffering from dementia worldwide will grow to 150 million by mid-century, and Alzheimer’s disease is the most common form of dementia contributing to 60%–70% of cases. The... more

Fig. 1. The original log mel spectrogram in (a) and with the time/frequency operations applied (b), (c), and (d). The black box and band show the time wart region in (b), the frequency masking region in (c) and the time masking region in (d).

Fig. 2. Architecture of the speech classifier for Alzheimer’s disease. The coding of the input data learnt during the encoding phase of the autoencoder allow the classification through a simple multilayer perceptron.

Automatic classifiers results (macro-averaged precision and recall): the DepAudioNet method compared with th proposed method.

Comparison between the state-of-the-art methods for Alzheimer’s disease detection and_ the proposed method based on autoencoder and data augmentation.

descriptionView Paper arrow_downwardDownload

Automatic Classification of Disordered Voices with Hidden Markov Models

by Redouane Benhammoud

2020, 2018 International Conference on Signal, Image, Vision and their Applications (SIVA), Guelma, Algeria, IEEE

The auditory-based method is commonly used in the assessment of voice disorders. This method is subjective in the sense that the evaluation result depends on the listener and a great deal of expertise is required to obtain reproducible... more

descriptionView Paper arrow_downwardDownload