Academia.eduAcademia.edu

speech classification

description11 papers
group0 followers
lightbulbAbout this topic
Speech classification is the process of categorizing spoken language into predefined classes based on its acoustic features, linguistic content, or speaker characteristics. This field encompasses various techniques from signal processing, machine learning, and linguistics to analyze and interpret speech for applications such as automatic speech recognition and speaker identification.
lightbulbAbout this topic
Speech classification is the process of categorizing spoken language into predefined classes based on its acoustic features, linguistic content, or speaker characteristics. This field encompasses various techniques from signal processing, machine learning, and linguistics to analyze and interpret speech for applications such as automatic speech recognition and speaker identification.

Key research themes

1. How can machine learning models improve spoken language and speaker classification accuracy under resource constraints and domain variability?

This research area focuses on leveraging machine learning (ML) techniques, including supervised and unsupervised learning, factorized convolutional neural networks, domain adversarial training, and self-supervised learning, to enhance the classification of speech units, spoken languages, voice disorders, and speakers. It addresses challenges related to limited training data, domain mismatch between clinical and real-world data, computational constraints for embedded systems, and variability across speakers and recording conditions. Developing robust, compact, and domain-invariant features is vital to deploying accurate classification systems in practical applications.

Key finding: This paper presents a hierarchical discrimination scheme combining supervised and unsupervised machine learning to classify spoken languages in realistic broadcast radio content with multilingual interference and limited... Read more
Key finding: The study deploys factorized convolutional neural networks with domain adversarial training to derive domain-invariant features that overcome domain mismatch between clean clinical recordings and noisy real-world voice data.... Read more
Key finding: This work proposes a novel approach using Gaussian Mixture Models (GMM) to learn class-specific multiple dictionaries for sparse feature extraction, improving speech unit classification across various languages and tasks.... Read more
Key finding: By combining glottal source estimation with features from pre-trained self-supervised models (wav2vec 2.0 and HuBERT) and employing hierarchical multi-class classifiers, this study achieves marked improvements in three-class... Read more
Key finding: This paper introduces a lifelong learning speaker diarization framework that incrementally adapts to dataset shift and discontinuous audio streams by incorporating human-in-the-loop active correction processes. Using the... Read more

2. What acoustic and feature extraction techniques best support speech unit and emotion classification despite speech variability and noise?

This theme explores advanced feature extraction methodologies—such as Mel-frequency cepstral coefficients (MFCC), wavelet packet subband analysis, spectral centroid irregularities, and cepstral-based representations—for robust classification of speech units, speech under stress, and emotions. It investigates how these features capture nuanced spectral-temporal dynamics and energy distribution in speech that are crucial for differentiating phonemes, stressed speech types, or emotional states, particularly in noisy or real-world environments where variability is high.

Key finding: The study introduces novel subband features derived from wavelet packet analysis, including Scale Energy, Autocorrelation-Scale-Energy and their cepstral counterparts, which outperform classical MFCC and autocorrelation-Mel... Read more
Key finding: Utilizing utterance-level pitch and energy statistics, the paper compares three classifiers—linear discriminant, k-nearest neighbor, and support vector machine—for emotion recognition in real human-machine telephone dialogs.... Read more
Key finding: A speech recognition system for Malayalam digits employs Discrete Wavelet Transforms (DWT) for feature extraction combined with wavelet denoising for noise suppression, comparing Artificial Neural Networks (ANN), Support... Read more
Key finding: This research develops a hybrid expert system leveraging neural networks and fuzzy classifiers to classify audio signals, including animal (bird) species based on their sounds. Using low-level MPEG-7 descriptors and neural... Read more
Key finding: By analyzing phoneme distributions among Gaussian components of speaker models, this study reveals that specific phonetic content contributes differentially to speaker recognition. The findings suggest that selecting speech... Read more

3. How can automatic classification support medical diagnosis and content management through speech analysis?

This research theme studies the application of automatic speech classification for medical diagnosis of neurodegenerative diseases and voice disorders, and for classification of audio content like call routing or speaker diarization in large datasets. It emphasizes extracting linguistic and acoustic markers from spontaneous speech or pathological voices for early detection of diseases like Alzheimer's and voice pathology, as well as efficient indexing and improved management of large-scale speech or broadcast audio by classifying speaker turns, languages, or call reasons.

Key finding: Using DementiaBank's Pitt Corpus transcripts, the study trains machine learning and neural networks to classify Alzheimer's disease (AD) from normal aging and other neurodegenerative diseases based on linguistic... Read more
Key finding: The authors propose a fully automated neurological disease classifier that processes raw speech spectrograms with a data-augmentation-supported artificial neural network, achieving 93.3% accuracy and 88.5% F1-score on the... Read more
Key finding: This paper demonstrates the use of Hidden Markov Models with mel-frequency cepstral coefficients as acoustic features to classify multiple types of vocal pathologies from Dutch speech data. The approach extends prior binary... Read more
Key finding: To handle large-scale call reason classification with 250 target classes and no annotated target domain data, this study exploits over 300,000 annotated utterances from related domains coupled with iterative re-annotation,... Read more
Key finding: The ALLIES corpus and associated protocols support human-assisted lifelong learning in speaker diarization over large diverse audio collections. The introduced system incrementally processes shows, links speakers cross-show,... Read more

All papers in speech classification

Cognitive and mental deterioration, such as difficulties with memory and language, are some of the typical phenotypes for most neurodegenerative diseases including Alzheimer's disease and other dementia forms. This paper describes the... more
During the outbreak of COVID-19 pandemic, several research areas joined efforts to mitigate the damages caused by SARS-CoV-2. In this paper we present an interpretability analysis of a convolutional neural network based model for COVID-19... more
Alzheimer's dementia (AD) affects memory, thinking, and language, deteriorating person's life. An early diagnosis is very important as it enables the person to receive medical help and ensure quality of life. Therefore, leveraging... more
Speech Recognition is widely being used and it has become part of our day to day. Several massive and popular applications have taken its use to another level. Most of the existing systems use machine learning techniques such as... more
Speech and language based automatic dementia detection is of interest due to it being non-invasive, low-cost and potentially able to aid diagnosis accuracy. The collected data are mostly audio recordings of spoken language and these can... more
Dementia is a group of irreversible, chronic, and progressive neurodegenerative disorders resulting in impaired memory, communication, and thought processes. In recent years, clinical research advances in brain aging have focused on the... more
Speech analysis could provide an indicator of cognitive health and help develop clinical tools for automatically detecting and monitoring cognitive health progression. The Mini Mental Status Examination (MMSE) is the most widely used... more
Continuous speech recognition is a multileveled pattern recognition task, which includes speech segmentation, classification, feature extraction and pattern recognition. In our work, a blind speech segmentation procedure was used to... more
Dementia is a group of irreversible, chronic, and progressive neurodegenerative disorders resulting in impaired memory, communication, and thought processes. In recent years, clinical research advances in brain aging have focused on the... more
Background: To evaluate the interest of using automatic speech analyses for the assessment of mild cognitive impairment (MCI) and early-stage Alzheimer's disease (AD). Methods: Healthy elderly control (HC) subjects and patients with MCI... more
Even today the reliable diagnosis of the prodromal stages of Alzheimer's disease (AD) remains a great challenge. Our research focuses on the earliest detectable indicators of cognitive decline in mild cognitive impairment (MCI). Since... more
Hearing a species in a tropical rainforest is much easier than seeing them. If someone is in the forest, he might not be able to look around and see every type of bird and frog that are there but they can be heard. A forest ranger might... more
Hearing a species in a tropical rainforest is much easier than seeing them. If someone is in the forest, he might not be able to look around and see every type of bird and frog that are there but they can be heard. A forest ranger might... more
Life expectancy increased globally, and the increasing prevalence of age-related issues is a major societal challenge. In particular, the World Health Organisation estimates that people suffering from dementia worldwide will grow up to... more
Spoken language Identification (LID) systems are needed to identify the language(s) present in a given audio sample, and typically could be the first step in many speech processing related tasks such as automatic speech recognition (ASR).... more
This paper presents some preliminary results of the OPLON project. It aimed at identifying early linguistic symptoms of cognitive decline in the elderly. This pilot study was conducted on a corpus composed of spontaneous speech sample... more
This paper investigates class-based speech recognition, and more precisely the impact of the selection of the training samples for each class on the final speech recognition performance. Increasing the number of recognition classes should... more
Remote, automated cognitive impairment (CI) diagnosis has the potential to facilitate care for the elderly. Speech is easily collected over the phone and already some common cognitive tests are administered remotely, resulting in regular... more
Alzheimer’s disease (AD) is an insidious progressive neurodegenerative disease resulting in impaired cognition, dementia, and eventual death. At the earliest stages of the disease, decline in multiple cognitive domains including speech... more
Alzheimer’s disease (AD) is an insidious progressive neurodegenerative disease resulting in impaired cognition, dementia, and eventual death. At the earliest stages of the disease, decline in multiple cognitive domains including speech... more
Alzheimer’s disease (AD) is an insidious progressive neurodegenerative disease resulting in impaired cognition, dementia, and eventual death. At the earliest stages of the disease, decline in multiple cognitive domains including speech... more
In this study, a novel method based on the voice intensity of a speech signal is used for automatic pathology detection with continuous speech. The proposed method determines the peaks from the speech signal to form a voice contour. The... more
With the rise of human-machine interactions, it has become necessary for machines to better understand humans in order to respond appropriately. Hence, in order to increase communication and interaction, it would be ideal for machines to... more
Dementia is a group of irreversible, chronic, and progressive neurodegenerative disorders resulting in impaired memory, communication, and thought processes. In recent years, clinical research advances in brain aging have focused on the... more
Continuous speech recognition is a multileveled pattern recognition task, which includes speech segmentation, classification, feature extraction and pattern recognition. In our work, a blind speech segmentation procedure was used to... more
Background: We developed transformer-based deep learning models based on natural language processing for early diagnosis of Alzheimer’s disease from the picture description test.Methods: The lack of large datasets poses the most important... more
With the global population ageing rapidly, Alzheimer's disease (AD) is particularly prominent in older adults, which has an insidious onset followed by gradual, irreversible deterioration in cognitive domains (memory, communication, etc).... more
Background: Advances in machine learning (ML) technology have opened new avenues for detection and monitoring of cognitive decline. In this study, a multimodal approach to Alzheimer's dementia detection based on the patient's... more
Continuous speech recognition is a multileveled pattern recognition task, which includes speech segmentation, classification, feature extraction and pattern recognition. In our work, a blind speech segmentation procedure was used to... more
Dementia is a group of irreversible, chronic, and progressive neurodegenerative disorders resulting in impaired memory, communication, and thought processes. In recent years, clinical research advances in brain aging have focused on the... more
Continuous speech recognition is a multileveled pattern recognition task, which includes speech segmentation, classification, feature extraction and pattern recognition. In our work, a blind speech segmentation procedure was used to... more
Alzheimer's disease is a fatal progressive brain disorder that worsens with time. It is high time we have inexpensive and quick clinical diagnostic techniques for early detection and care. In previous studies, various Machine Learning... more
Deep learning models have improved cutting-edge technologies in many research areas, but their black-box structure makes it difficult to understand their inner workings and the rationale behind their predictions. This may lead to... more
The ongoing development of audio datasets for numerous languages has spurred research activities towards designing smart speech recognition systems. A typical speech recognition system can be applied in many emerging applications, such as... more
Dementia is a group of irreversible, chronic, and progressive neurodegenerative disorders resulting in impaired memory, communication, and thought processes. In recent years, clinical research advances in brain aging have focused on the... more
Background We developed transformer-based deep learning models based on natural language processing for early risk assessment of Alzheimer’s disease from the picture description test. Methods The lack of large datasets poses the most... more
According to the World Health Organization, the number of people suffering from dementia worldwide will grow to 150 million by mid-century, and Alzheimer’s disease is the most common form of dementia contributing to 60%–70% of cases. The... more
The auditory-based method is commonly used in the assessment of voice disorders. This method is subjective in the sense that the evaluation result depends on the listener and a great deal of expertise is required to obtain reproducible... more
Download research papers for free!