Academia.eduAcademia.edu

Speech Processing

description12,040 papers
group29,322 followers
lightbulbAbout this topic
Speech processing is the interdisciplinary field that focuses on the analysis, synthesis, and recognition of human speech. It encompasses various techniques and technologies for converting spoken language into a machine-readable format, enabling applications such as speech recognition, speech synthesis, and speaker identification.
lightbulbAbout this topic
Speech processing is the interdisciplinary field that focuses on the analysis, synthesis, and recognition of human speech. It encompasses various techniques and technologies for converting spoken language into a machine-readable format, enabling applications such as speech recognition, speech synthesis, and speaker identification.

Key research themes

1. How have automatic speech recognition (ASR) systems evolved methodologically to address speech variability and improve recognition accuracy?

This theme examines the technological and methodological progression in ASR systems from early pattern matching techniques to advanced probabilistic models and neural networks. Central challenges include handling intra- and inter-speaker variability, continuous speech recognition, and environmental noise. Understanding these developments is crucial for optimizing ASR accuracy and robustness in diverse real-world settings.

by John Levis and 
1 more
Key finding: This paper outlines three major ASR approaches: pattern matching, statistical models based on Hidden Markov Models (HMMs), and neural networks. It highlights HMMs as the predominant statistical method since the 1980s, capable... Read more
Key finding: Introduces Layered Markov Models (LMMs), an architectural innovation integrating multiple knowledge levels (acoustic, lexical, language) into a single Markov model framework. LMMs formalize and unify various recognition and... Read more
Key finding: Focuses on implementing an ASR system for embedded, handheld devices, particularly on the PXA27x XScale processor, emphasizing the pipeline from acoustic input to recognized text using HMMs. Key innovations include noise... Read more

2. What roles do multisensory inputs and motor theories play in advancing models of human speech perception?

This theme investigates how speech perception research integrates auditory, visual, and tactile modalities, and how motor theories of perception explain the 'lack of invariance' problem in acoustic signals. Multisensory approaches consider how visual cues (e.g., lip movements) and somatosensory feedback contribute to phonetic interpretation, helping resolve ambiguity and enhancing recognition, with implications for both human and machine perception models.

Key finding: This paper reviews evidence demonstrating that speech perception is inherently multisensory, involving audition, vision, and touch. Visual speech information significantly improves perception in noisy conditions and can... Read more
Key finding: Proposes an active, hypothesis-testing motor theory where speech perception involves predicting and interpreting acoustic inputs via visible gestures and other contextual information, addressing the lack of invariant acoustic... Read more
Key finding: Synthesizes recent EEG and behavioral studies revealing how audiovisual speech perception varies across populations, including individuals with autism spectrum disorder and schizophrenia, and discusses mechanisms underlying... Read more

3. How can open-access clinical speech corpora facilitate reproducible research and the development of AI speech technologies for atypical speech populations?

This theme explores the creation, accessibility, and utility of large clinical speech datasets to support reproducibility, comparative research, clinical training, and AI development for populations with speech sound disorders. Such corpora enable standardized evaluation, algorithm training, and facilitate education in speech processing, particularly addressing challenges related to representing children and individuals with speech impairments in training data.

Key finding: Details the development and dissemination of PERCEPT-R and PERCEPT-GFTA corpora comprising over 36 hours of annotated speech from children and young adults with residual speech sound disorders and controls. The corpora are... Read more

All papers in Speech Processing

NLP research on aligning lexical representation spaces to one another has so far focused on aligning language spaces in their entirety. However, cognitive science has long focused on a local perspective, investigating whether translation... more
Speech is the expression of or the ability to express thoughts and feelings by articulate sounds. It is the main way of communication between humans. There are thousands of languages used in the world. Speech recognition is a process of... more
The paper discusses aspects of data research, in-depth data analysis, knowledge acquisition, methods of data processing in the knowledge base, methods of intellectual analysis, and application of data mining in the field of medicine. A... more
We present the Ultrasonic Consciousness Hypothesis, proposing that the systematic removal of ultrasonic frequencies (20-96kHz) through lossy audio compression since the 1990s may have inadvertently eliminated crucial emotional grounding... more
In recent years, speech recognition technology has become a more common notion. Speech quality and intelligibility are critical for the convenience and accuracy of information transmission in speech recognition. The speech processing... more
Text processing in Serbian is based on the Intex format system of electronic dictionaries. Although lexical recognition is successful for 75% to 90% of word forms (depending on the type of text), some categories of words remain... more
The world economy is evolving at a very fast rate with revolutionary technologies altering the conventional patterns of business and governance. Among them, blockchain and big data are two prominent forces that are transforming the way... more
Oral proficiency testing plays a critical role in language assessment; however, classic ASR faces such problems as Americans Bias, which means difficulty for ASR as application to recognize non-American accents, and the evaluation... more
This paper describes a method to use the thresholding technique to automatically classify the three parts of a speech signal - silence, unvoiced and voiced. It makes use of three characteristics of the speech signal namely - short time... more
With the help of automatic speech recognition (ASR) techniques, computers become capable of recognizing speech. The Quran is the speech of Allah (The God); it is the Holy book for all Muslims in the world; it is written and recited in... more
This paper presents the TYPALOC corpus of French Dysarthric and Healthy speech and the rationale underlying its constitution. The objective is to compare phonetic variation in the speech of dysarthric vs. healthy speakers in different... more
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or... more
Best Tree Encoding (BTE) is a promising feature extraction technique based on wavelet packet decomposition that is utilized in Automatic Speech Recognition (ASR). This research introduces an enhancement of Wavelet Packet Best Tree (WPBT)... more
Under an ARC Linkage Infrastructure, Equipment and Facilities (LIEF) grant, speech science and technology experts from across Australia have joined forces to organise the recording of audio-visual (AV) speech data from representative... more
Parkinson’s  disease  (PD)  is  a  neurodegenerative  disorder  that  affects  the  coordination of muscles and limbs, including those responsible of the speech production. The lack of control of the limbs and muscles involved in the... more
Processing incoming sensory information and transforming this input into appropriate motor responses is a critical and ongoing aspect of our moment-to-moment interaction with the environment. While the neural mechanisms in the posterior... more
This paper introduces a new learning algorithm for human activity recognition capable of simultaneous regression and classification. Building upon Conditional Restricted Boltzmann Machines (CRBMs), Factored Four Way Conditional Restricted... more
The pervasive development disorders in autism condition lead to impairments in language and social communication. They are evidenced as atypical prosody production, emotion recognition and apraxia, among others communication deficits.... more
Although images as viewed from intermediate virtual viewpoints can be synthesized using texture and depth maps from nearby camera views via depth-image-based rendering (DIBR), the rendered images contain disocclusion holesspatial regions... more
We approach the problem of understanding the disease progression and speech of patients suffering from ALS as a language divergence identification problem. We summarize the promises and challenges of using speech-related biomarkers to... more
Streszczenie. w artykule omówiono własności najczęściej wykorzystywanych masek filtrów krawędziowych (operatorów kontekstowych) laplace'a stopnia trzeciego i piątego wraz ze schematami różnicowymi, które stanowią podstawę ich... more
Text-to-Speech (TTS) synthesis is a problem almost as old as Natural Language Processing (NLP). The focus of this problem is on creating tools capable of generating a voice uttering a given text. Deep learning-powered solutions try to go... more
Recently, a number of (approximate) approaches emerged in speech processing, which try to overcome the known lack of match between symbol level evaluation measures (e.g., word error rate) and the standard string (symbol sequence) cost... more
The adoption of Electric Vehicles (EVs) over traditional fossil fuel-based vehicles (FVs) offers a multitude of benefits. These advantages encompass improved transportation energy efficiency, reduced carbon and noise emissions, and the... more
This paper describes the audio segmentation system developed by Transmedia Catalonia / Telecommunication and Systems Engineering Department, at the Autonomous University of Barcelona (UAB), for the Albayzin 2014 Audio Segmentation... more
COntains research objectives.U. S. Air Force (Electronic Systems Division) under Contract AF 19(628)-3325National Science Foundation (Grant G-16526)National Institutes of Health (Grant MH-04737-03)National Institutes of Health (Grant... more
Cardiovascular diseases are still the primary threats to people's health around the world. Automatic heart sound classification technology, as a fast and efficient means for diagnosis and treatment, is of great clinical significance. With... more
This patent is based on a novel model of human categorial perception of harmonic speech sounds, alternative to the currently dominant formant theory and to the NLP procedures based on the concepts of amplitude envelope and Mel-frequency... more
This paper reports a methods-driven exploration of operational (pre-conscious) coordination in large language model (LLM) dialogues. We introduce three falsifier tests that avoid ontological claims: F1 (identity persistence under... more
Secure communication mechanisms in Wireless Sensor Networks (WSNs) have been widely deployed to ensure confidentiality, authenticity and integrity of the nodes and data. Recently many WSNs applications rely on trusted communication to... more
The Empathic Tour Guide System is a context-aware mobile system, including an 'intelligent empathic guide with attitude', offering the user a seamless, temporally and spatially dependent, multi-modal interaction interface. It... more
Pesquisa exploratória descritiva realizada em um hospital público de Curitiba. A população foi composta por 173 trabalhadores de enfermagem do turno noturno, com objetivo de identificar os principais agravos à saúde desses trabalhadores.... more
Introdução: O conceito de cargas de trabalho vem sendo usado pela enfermagem com vários significados, entre eles o de dependência do paciente e intensidade do trabalho. Propomos acrescentar a compreensão de cargas de trabalho como... more
Machine learning algorithm to enhance the complex speech signal for mobile communication is one of the research problems in signal processing. The objective of this research paper is to develop a learning algorithm that improves the... more
They use optical sensors and artificial intelligence methods for process supervision and diagnostics. Research is aimed to develop a system allowing a parametric evaluation of the quality of pulverized coal burner operation. Due to the... more
With widespread use of online forms and questionnaires, detection of the user's intent to lie has become increasingly important. In-lab studies have shown that mouse dynamics-information on how the user operates a mouse-can be valuable... more
People who deceive in personality assessment questionnaires can resort to lying in pursuit of socially harmful goals. Efforts to validate the veracity of answers is a complex challenge. Traditional social desirability scales have been... more
The pursuit of larger, more capable Large Language Models (LLMs) is fundamentally constrained by the immense computational cost of their training and inference. While the Mixture-of-Experts (MoE) paradigm successfully decouples model... more
Network mount *.wav *.seg *.wav *.seg *.log
Although there has been a lot of interest in recognizing and understanding air traffic control (ATC) speech, none of the published works have obtained detailed field data results. We have developed a system able to identify the language... more
In this paper we make a critical revision of the state-of-the-art in automatic speech processing as applied to Air Traffic Control. We present the development of a new ATC speech understanding system comparing its performance and... more
Spoken dialogue systems can encounter different types of errors, including nonunderstanding errors where the system recognises that the user has spoken, but does not understand the utterance. Strategies for dealing with this kind of error... more
We introduce a digital game for children’s foreign-language learning that uses automatic speech recognition (ASR) for evaluating children’s utterances. Our first prototype focuses on the learning of English words and their pronunciation.... more
In this study we describe two techniques for handling convolutional distortion with ‘missing data’ speech recognition using spectral features. The missing data approach to automatic speech recognition (ASR) is motivated by a model of... more
The pursuit of larger, more capable Large Language Models (LLMs) is fundamentally constrained by the immense computational cost of their training and inference. While the Mixture-of-Experts (MoE) paradigm successfully decouples model... more
RESUMO Objetivo Avaliar o reconhecimento de fala, considerando a previsibilidade da palavra a partir de um teste elaborado. Métodos Foi realizada anamnese, testes de rastreio de comprometimento cognitivo e depressão e avaliação... more
We present an approach to content-based sound retrieval using auditory models, self-organizing neural networks, and string matching techniques. It addresses the issues of spotting perceptually similar occurrences of a particular sound... more
Download research papers for free!