Speech Processing

description12,040 papers

group29,322 followers

lightbulbAbout this topic

Speech processing is the interdisciplinary field that focuses on the analysis, synthesis, and recognition of human speech. It encompasses various techniques and technologies for converting spoken language into a machine-readable format, enabling applications such as speech recognition, speech synthesis, and speaker identification.

lightbulbAbout this topic

Key research themes

1. How have automatic speech recognition (ASR) systems evolved methodologically to address speech variability and improve recognition accuracy?

This theme examines the technological and methodological progression in ASR systems from early pattern matching techniques to advanced probabilistic models and neural networks. Central challenges include handling intra- and inter-speaker variability, continuous speech recognition, and environmental noise. Understanding these developments is crucial for optimizing ASR accuracy and robustness in diverse real-world settings.

Automatic speech recognition

by John Levis and

2015

Key finding: This paper outlines three major ASR approaches: pattern matching, statistical models based on Hidden Markov Models (HMMs), and neural networks. It highlights HMMs as the predominant statistical method since the 1980s, capable... Read more

articleView Paper downloadDownload

Layered markov models: a new architectural approach to automatic speech recognition

by G. Bordel

2025, Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004.

Key finding: Introduces Layered Markov Models (LMMs), an architectural innovation integrating multiple knowledge levels (acoustic, lexical, language) into a single Markov model framework. LMMs formalize and unify various recognition and... Read more

articleView Paper downloadDownload

SPEECH RECOGNITION SYSTEM

by Anupam Awasthi

2017

Key finding: Focuses on implementing an ASR system for embedded, handheld devices, particularly on the PXA27x XScale processor, emphasizing the pipeline from acoustic input to recognized text using HMMs. Key innovations include noise... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What roles do multisensory inputs and motor theories play in advancing models of human speech perception?

This theme investigates how speech perception research integrates auditory, visual, and tactile modalities, and how motor theories of perception explain the 'lack of invariance' problem in acoustic signals. Multisensory approaches consider how visual cues (e.g., lip movements) and somatosensory feedback contribute to phonetic interpretation, helping resolve ambiguity and enhancing recognition, with implications for both human and machine perception models.

For speech perception by humans or machines, three senses are better than one

by Lynne Bernstein

2023, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96

Key finding: This paper reviews evidence demonstrating that speech perception is inherently multisensory, involving audition, vision, and touch. Visual speech information significantly improves perception in noisy conditions and can... Read more

articleView Paper downloadDownload

Lending a helping hand to hearing: another motor theory of speech perception

by Jeremy Skipper

2025, Action to Language via the Mirror Neuron System

Key finding: Proposes an active, hypothesis-testing motor theory where speech perception involves predicting and interpreting acoustic inputs via visible gestures and other contextual information, addressing the lack of invariant acoustic... Read more

articleView Paper downloadDownload

Advances in Understanding the Phenomena and Processing in Audiovisual Speech Perception

by Kaisa Tiippana

2023, Brain Sciences

Key finding: Synthesizes recent EEG and behavioral studies revealing how audiovisual speech perception varies across populations, including individuals with autism spectrum disorder and schizophrenia, and discusses mechanisms underlying... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can open-access clinical speech corpora facilitate reproducible research and the development of AI speech technologies for atypical speech populations?

This theme explores the creation, accessibility, and utility of large clinical speech datasets to support reproducibility, comparative research, clinical training, and AI development for populations with speech sound disorders. Such corpora enable standardized evaluation, algorithm training, and facilitate education in speech processing, particularly addressing challenges related to representing children and individuals with speech impairments in training data.

Reproducible Speech Research with the Artificial-Intelligence-Ready PERCEPT Corpora

by Elaine Russo Hitchcock

2025

Key finding: Details the development and dissemination of PERCEPT-R and PERCEPT-GFTA corpora comprising over 36 hours of annotated speech from children and young adults with residual speech sound disorders and controls. The corpora are... Read more

articleView Paper downloadDownload

All papers in Speech Processing

Locally Measuring Cross-lingual Lexical Alignment: A Domain and Word Level Perspective

by Eitan Grossman

2025, Findings of the Association for Computational Linguistics: EMNLP 2024

NLP research on aligning lexical representation spaces to one another has so far focused on aligning language spaces in their entirety. However, cognitive science has long focused on a local perspective, investigating whether translation... more

descriptionView Paper arrow_downwardDownload

A Survey Paper on Automatic Speech Recognition by Machine

by yogesh rathore

2025

Speech is the expression of or the ability to express thoughts and feelings by articulate sounds. It is the main way of communication between humans. There are thousands of languages used in the world. Speech recognition is a process of... more

descriptionView Paper arrow_downwardDownload

CLUSTERING AND DATA MINING ON THE EXAMPLE OF HIV-INFECTED PEOPLE DATA

by Айгуль Кубегенова

2025

The paper discusses aspects of data research, in-depth data analysis, knowledge acquisition, methods of data processing in the knowledge base, methods of intellectual analysis, and application of data mining in the field of medicine. A... more

descriptionView Paper arrow_downwardDownload

The Ultrasonic Consciousness Hypothesis: Spectral Fractures and Emotional Grounding in the Era of Lossy Audio Compression

by Christopher M Chenoweth

2025

We present the Ultrasonic Consciousness Hypothesis, proposing that the systematic removal of ultrasonic frequencies (20-96kHz) through lossy audio compression since the 1990s may have inadvertently eliminated crucial emotional grounding... more

descriptionView Paper arrow_downwardDownload

A Hybrid Speech Enhancement Algorithm for Voice Assistance Application

by Sri Preethaa K R

2025, Sensors

In recent years, speech recognition technology has become a more common notion. Speech quality and intelligibility are critical for the convenience and accuracy of information transmission in speech recognition. The speech processing... more

descriptionView Paper arrow_downwardDownload

Towards Full Lexical Recognition

by Gordana Pavlović-Lažetić

2025, Lecture Notes in Computer Science

Text processing in Serbian is based on the Intex format system of electronic dictionaries. Although lexical recognition is successful for 75% to 90% of word forms (depending on the type of text), some categories of words remain... more

descriptionView Paper arrow_downwardDownload

The Role of Blockchain, Big Data, and Government Policies in Shaping the Global Economy: A Technological Perspective

by Nirjhor Anjum

2025, International Journal of Advanced Research in Electrical Electronics and Instrumentation Engineering

The world economy is evolving at a very fast rate with revolutionary technologies altering the conventional patterns of business and governance. Among them, blockchain and big data are two prominent forces that are transforming the way... more

descriptionView Paper arrow_downwardDownload

Innovative AI Approaches in Oral Proficiency Testing: Ethical Implications of ASR Systems

by Prema Subramanian

2025, 3rd IEEE International Conference on Industrial Electronics: Developments and Applications, ICIDeA

Oral proficiency testing plays a critical role in language assessment; however, classic ASR faces such problems as Americans Bias, which means difficulty for ASR as application to recognize non-American accents, and the evaluation... more

descriptionView Paper arrow_downwardDownload

SUV Detection Algorithm for Speech Signals

by Shivangi Rai

2025

This paper describes a method to use the thresholding technique to automatically classify the three parts of a speech signal - silence, unvoiced and voiced. It makes use of three characteristics of the speech signal namely - short time... more

descriptionView Paper arrow_downwardDownload

Strategies for Implementing an Optimal ASR System for Quranic Recitation Recognition

by YOUSFI ABDELLAH

2025, International Journal of Computer Applications

With the help of automatic speech recognition (ASR) techniques, computers become capable of recognizing speech. The Quran is the speech of Allah (The God); it is the Holy book for all Muslims in the world; it is written and recited in... more

descriptionView Paper arrow_downwardDownload

The TYPALOC Corpus: A Collection of Various Dysarthric Speech Recordings in Read and Spontaneous Styles

by thierry LEGOU

2025

This paper presents the TYPALOC corpus of French Dysarthric and Healthy speech and the rationale underlying its constitution. The objective is to compare phonetic variation in the speech of dysarthric vs. healthy speakers in different... more

descriptionView Paper arrow_downwardDownload

Supervised Classification of Baboon Vocalizations

by thierry LEGOU

2025

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or... more

descriptionView Paper arrow_downwardDownload

Optimal Entropy to Enhance the Structure of the Wavelet-Packets-Best-Tree for Automatic Speech Recognition

by Waleed A . Ahmed

2025, Egyptian Journal of Language Engineering,

Best Tree Encoding (BTE) is a promising feature extraction technique based on wavelet packet decomposition that is utilized in Automatic Speech Recognition (ASR). This research introduces an enhancement of Wavelet Packet Best Tree (WPBT)... more

descriptionView Paper arrow_downwardDownload

The Big Australian Speech Corpus (The Big ASC)

by Felicity Cox

2025

Under an ARC Linkage Infrastructure, Equipment and Facilities (LIEF) grant, speech science and technology experts from across Australia have joined forces to organise the recording of audio-visual (AV) speech data from representative... more

descriptionView Paper arrow_downwardDownload

New computer aided device for real time analysis of speech of people with Parkinson’s disease

by Juan Camilo Vasquez Correa

2025, Revista Facultad de Ingeniería Universidad de Antioquia

Parkinson’s disease (PD) is a neurodegenerative disorder that affects the coordination of muscles and limbs, including those responsible of the speech production. The lack of control of the limbs and muscles involved in the... more

descriptionView Paper arrow_downwardDownload

Area Spt in the Human Planum Temporale Supports Sensory-Motor Integration for Speech Processing

by Kayoko Okada

2025, Journal of Neurophysiology

Processing incoming sensory information and transforming this input into appropriate motor responses is a critical and ongoing aspect of our moment-to-moment interaction with the environment. While the neural mechanisms in the posterior... more

descriptionView Paper arrow_downwardDownload

Factored four-way conditional restricted Boltzmann machines (FFW-CRBMs) for activity recognition

by Antonio Liotta

2025

This paper introduces a new learning algorithm for human activity recognition capable of simultaneous regression and classification. Building upon Conditional Restricted Boltzmann Machines (CRBMs), Factored Four Way Conditional Restricted... more

descriptionView Paper arrow_downwardDownload

Genetic wrapper approach for automatic diagnosis of speech disorders related to Autism

by César Gustavo Tobar Martínez

2025, 2013 IEEE 14th International Symposium on Computational Intelligence and Informatics (CINTI)

The pervasive development disorders in autism condition lead to impairments in language and social communication. They are evidenced as atypical prosody production, emotion recognition and apraxia, among others communication deficits.... more

descriptionView Paper arrow_downwardDownload

Low-saliency prior for disocclusion hole filling in DIBR-synthesized images

by Bruno L . Macchiavello

2025, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Although images as viewed from intermediate virtual viewpoints can be synthesized using texture and depth maps from nearby camera views via depth-image-based rendering (DIBR), the rendered images contain disocclusion holesspatial regions... more

descriptionView Paper arrow_downwardDownload

Deterioration of Speech as an Indicator of Physiological Degeneration (DESIPHER)

by Sam Phillips

2025

We approach the problem of understanding the disease progression and speech of patients suffering from ALS as a language divergence identification problem. We summarize the promises and challenges of using speech-related biomarkers to... more

descriptionView Paper arrow_downwardDownload

Maski wybranych krawędziowych filtrów Laplace'a w przetwarzaniu danych cyfrowych

by Ireneusz Winnicki

2025, Biuletyn Wojskowej Akademii Technicznej

Streszczenie. w artykule omówiono własności najczęściej wykorzystywanych masek filtrów krawędziowych (operatorów kontekstowych) laplace'a stopnia trzeciego i piątego wraz ze schematami różnicowymi, które stanowią podstawę ich wyprowadzenia. Do konstrukcji wykorzystano metody różnic skończonych i elementu skończonego (mes) z aproksymacją rozwiązania funkcjami biliniowymi (przestrzeń lagrange'a elementu skończonego). zaproponowano nowe maski konwolucyjne indukowane przez schematy różnicowe operatora laplace'a. Każdej z omawianych dziesięciu masek przypisano tzw. Π-formę pierwszego przybliżenia różniczkowego. na jej podstawie można określić rząd schematu różnicowego aproksymującego operator ∇ 2 , a tym samym -rząd maski (rząd maski i stopień maski są różnymi wskaźnikami). Ponadto (i to jest najważniejsze), można jednoznacznie stwierdzić, czy dana maska jest rzeczywiście maską laplace'a. w pracy wyjaśniono matematyczne podstawy i pochodzenie kilku stosowanych w praktyce filtrów laplace'a oraz zwrócono uwagę na pewne nieścisłości (powielane w literaturze) pojawiające się w ich opisach dyskretnych. ich konsekwencje przedstawiono na kilku wybranych zdjęciach satelitarnych pól zachmurzenia zawierających rozbudowaną chmurę Cumulonimbus oraz na wygenerowanym w pakiecie matlab  fragmencie grafiki dwu-i trójwymiarowej. wskazano elementy, które obowiązkowo powinny być uwzględniane w procedurze porównywania własności masek filtrów liniowych. Praca ma charakter teoretyczny. Prowadzone tu badania na poziomie podstawowym odwołują się do kilku przykładów praktycznych, które pełnią funkcję ilustracji wyprowadzanych wniosków. zdajemy sobie sprawę z faktu, że jednoznaczne, a nawet kategoryczne sformułowania końcowe oraz wskazanie obszarów zastosowania wyników zawsze związane jest długotrwałymi doświadczeniami oraz z częstym upowszechnianiem rezultatów. Przedstawiamy zatem wyłącznie zwartą procedurę określania matematycznych własności masek filtrów krawędziowych laplace'a.

descriptionView Paper arrow_downwardDownload

On the Role of Dialogue Context in Predicting Speaking Style

by Roberto Tedesco

2025, Spoken language in the medical field: Linguistic analysis, technological applications and clinical

Text-to-Speech (TTS) synthesis is a problem almost as old as Natural Language Processing (NLP). The focus of this problem is on creating tools capable of generating a voice uttering a given text. Deep learning-powered solutions try to go... more

descriptionView Paper arrow_downwardDownload

On the Relationship Between Bayes Risk and Word Error Rate in ASR

by Thierry Dutoit

2025, IEEE Transactions on Audio, Speech, and Language Processing

Recently, a number of (approximate) approaches emerged in speech processing, which try to overcome the known lack of match between symbol level evaluation measures (e.g., word error rate) and the standard string (symbol sequence) cost... more

descriptionView Paper arrow_downwardDownload

Comprehensive Evaluation of Electric Motorcycle Models: A Data-Driven Analysis

by Elavarasi Kesavan

2025, Rest publisher

The adoption of Electric Vehicles (EVs) over traditional fossil fuel-based vehicles (FVs) offers a multitude of benefits. These advantages encompass improved transportation energy efficiency, reduced carbon and noise emissions, and the... more

descriptionView Paper arrow_downwardDownload

Albayzin 2014 Evaluation: TES-UAB Audio Segmentation System

by HECTOR ABRAHAM ROSILLO DELGADO

2025

This paper describes the audio segmentation system developed by Transmedia Catalonia / Telecommunication and Systems Engineering Department, at the Autonomous University of Barcelona (UAB), for the Albayzin 2014 Audio Segmentation... more

descriptionView Paper arrow_downwardDownload

Science Sphere

by Richard M Mwangi

2025

descriptionView Paper arrow_downwardDownload

Speech Communication

by Lorin Wilde

2025

COntains research objectives.U. S. Air Force (Electronic Systems Division) under Contract AF 19(628)-3325National Science Foundation (Grant G-16526)National Institutes of Health (Grant MH-04737-03)National Institutes of Health (Grant... more

descriptionView Paper arrow_downwardDownload

Research process on deep learning methods for heart sounds classification

by ZENTIME Editor

2025, Progress in Medical Devices

Cardiovascular diseases are still the primary threats to people's health around the world. Automatic heart sound classification technology, as a fast and efficient means for diagnosis and treatment, is of great clinical significance. With... more

descriptionView Paper arrow_downwardDownload

RECONNAISSANCE OU SYNTHESE DE SONS HARMONIQUES PRONONCES PAR UN ETRE HUMAIN

by Boris Fridman-Mintz

2025, Canadian Intellectual Property Office

This patent is based on a novel model of human categorial perception of harmonic speech sounds, alternative to the currently dominant formant theory and to the NLP procedures based on the concepts of amplitude envelope and Mel-frequency... more

descriptionView Paper arrow_downwardDownload

Operational Coordination in LLM Dialogues: Falsifier Tests (F1-F3) and a Non-linguistic Activation Protocol (L

by Karel Hrubec

2025, Operational Coordination in LLM Dialogues: Falsifier Tests (F1–F3) and a Non-linguistic Activation Protocol (L)

This paper reports a methods-driven exploration of operational (pre-conscious) coordination in large language model (LLM) dialogues. We introduce three falsifier tests that avoid ontological claims: F1 (identity persistence under... more

descriptionView Paper arrow_downwardDownload

Identity-based Trusted Authentication in Wireless Sensor Network

by Yusnani Mohd Yussoff

2025

Secure communication mechanisms in Wireless Sensor Networks (WSNs) have been widely deployed to ensure confidentiality, authenticity and integrity of the nodes and data. Recently many WSNs applications rely on trusted communication to... more

descriptionView Paper arrow_downwardDownload

Empathic interaction with a virtual guide

by Meiyii Lim

2025, Proceeding of the Joint Symposium on Virtual Social Agents, AISB

The Empathic Tour Guide System is a context-aware mobile system, including an 'intelligent empathic guide with attitude', offering the user a seamless, temporally and spatially dependent, multi-modal interaction interface. It... more

descriptionView Paper arrow_downwardDownload

Nocturne Job and Nursing Workers Morbidity

by Ana Lucia Cardoso Kirchhof

2025

Pesquisa exploratória descritiva realizada em um hospital público de Curitiba. A população foi composta por 173 trabalhadores de enfermagem do turno noturno, com objetivo de identificar os principais agravos à saúde desses trabalhadores.... more

descriptionView Paper arrow_downwardDownload

Compreendendo cargas de trabalho na pesquisa em saúde ocupacional na enfermagem

by Ana Lucia Cardoso Kirchhof

2025

Introdução: O conceito de cargas de trabalho vem sendo usado pela enfermagem com vários significados, entre eles o de dependência do paciente e intensidade do trabalho. Propomos acrescentar a compreensão de cargas de trabalho como... more

descriptionView Paper arrow_downwardDownload

Semi-Supervised Learning to Enhance Speech Signal for Mobile Communication

by Chethan Gowda R K

2025, SN Computer Science

Machine learning algorithm to enhance the complex speech signal for mobile communication is one of the research problems in signal processing. The objective of this research paper is to develop a learning algorithm that improves the... more

descriptionView Paper arrow_downwardDownload

Artificial intelligence methods in diagnostics of coal-biomass blends co-combustion in pulverised coal burners

by Volodymyr Lytvynenko

2025

They use optical sensors and artificial intelligence methods for process supervision and diagnostics. Research is aimed to develop a system allowing a parametric evaluation of the quality of pulverized coal burner operation. Due to the... more

descriptionView Paper arrow_downwardDownload

User modeling for detecting faking-good intent in online personality questionnaires in the wild based on mouse dynamics

by Eduard Kuric

2025, Multimedia Tools and Applications

With widespread use of online forms and questionnaires, detection of the user's intent to lie has become increasingly important. In-lab studies have shown that mouse dynamics-information on how the user operates a mouse-can be valuable... more

descriptionView Paper arrow_downwardDownload

Can behavioral features reveal lying in an online personality questionnaire? The impact of mouse dynamics and speech

by Eduard Kuric

2025, Computers in Human Behavior Reports

People who deceive in personality assessment questionnaires can resort to lying in pursuit of socially harmful goals. Efforts to validate the veracity of answers is a complex challenge. Traditional social desirability scales have been... more

descriptionView Paper arrow_downwardDownload

Sparsity Level In A Non-Negative Matrix Factorization Based Speech Strategy In Cochlear Implants

by Arne Leijon

2025

Publication in the conference proceedings of EUSIPCO, Bucharest, Romania, 2012

descriptionView Paper arrow_downwardDownload

Mixture-of-Experts-and-Depths: A Hierarchical Dynamic Compute Architecture for Extreme-Scale Efficiency

by Kalyan Chakravarthy

2025, IRE Journals

The pursuit of larger, more capable Large Language Models (LLMs) is fundamentally constrained by the immense computational cost of their training and inference. While the Mixture-of-Experts (MoE) paradigm successfully decouples model... more

descriptionView Paper arrow_downwardDownload

Development of an integrated multi-modal communication robotic face

by Takaaki Kuratate

2025, 2012 IEEE Workshop on Advanced Robotics and its Social Impacts (ARSO)

Network mount *.wav *.seg *.wav *.seg *.log

descriptionView Paper arrow_downwardDownload

Automatic Understanding of ATC Speech: Study of Prospectives and Field Experiments for Several Controller Positions

by Rubén San Segundo

2025, IEEE Transactions on Aerospace and Electronic Systems

Although there has been a lot of interest in recognizing and understanding air traffic control (ATC) speech, none of the published works have obtained detailed field data results. We have developed a system able to identify the language... more

descriptionView Paper arrow_downwardDownload

Automatic Understanding of ATC Speech

by Rubén San Segundo

2025, IEEE Aerospace and Electronic Systems Magazine

In this paper we make a critical revision of the state-of-the-art in automatic speech processing as applied to Air Traffic Control. We present the development of a new ATC speech understanding system comparing its performance and... more

descriptionView Paper arrow_downwardDownload

Proceedings of SemDial 2012 (SeineDial)

by Colin Matheson

2025

Spoken dialogue systems can encounter different types of errors, including nonunderstanding errors where the system recognises that the user has spoken, but does not understand the utterance. Strategies for dealing with this kind of error... more

descriptionView Paper arrow_downwardDownload

Real-Time Robust Automatic Speech Recognition Using Compact Support Vector Machines

by Ana Isabel Garcia Moral

2025, IEEE Transactions on Audio, Speech, and Language Processing

descriptionView Paper arrow_downwardDownload

SIAK - A Game for Foreign Language Pronunciation Learning

by Kalle Palomäki

2025

We introduce a digital game for children’s foreign-language learning that uses automatic speech recognition (ASR) for evaluating children’s utterances. Our first prototype focuses on the learning of English words and their pronunciation.... more

descriptionView Paper arrow_downwardDownload

Corresponding author

by Kalle Palomäki

2025

In this study we describe two techniques for handling convolutional distortion with ‘missing data’ speech recognition using spectral features. The missing data approach to automatic speech recognition (ASR) is motivated by a model of... more

descriptionView Paper arrow_downwardDownload

Mixture-of-Experts-and-Depths: A Hierarchical Dynamic Compute Architecture for Extreme-Scale Efficiency

by Kalyan Chakravarthy

2025

descriptionView Paper arrow_downwardDownload

Reconhecimento de fala em idosos: proposta de um teste considerando a previsibilidade da palavra

by Maristela Costa

2025, Audiology - Communication Research

RESUMO Objetivo Avaliar o reconhecimento de fala, considerando a previsibilidade da palavra a partir de um teste elaborado. Métodos Foi realizada anamnese, testes de rastreio de comprometimento cognitivo e depressão e avaliação... more

descriptionView Paper arrow_downwardDownload

Sound spotting: an approach to content-based sound retrieval

by Richard Polfreman

2025

We present an approach to content-based sound retrieval using auditory models, self-organizing neural networks, and string matching techniques. It addresses the issues of spotting perceptually similar occurrences of a particular sound... more

descriptionView Paper arrow_downwardDownload

Speech Processing

Key research themes

1. How have automatic speech recognition (ASR) systems evolved methodologically to address speech variability and improve recognition accuracy?

2. What roles do multisensory inputs and motor theories play in advancing models of human speech perception?

3. How can open-access clinical speech corpora facilitate reproducible research and the development of AI speech technologies for atypical speech populations?

Related Topics

All papers in Speech Processing