Voice Production

description61 papers

group15 followers

lightbulbAbout this topic

Voice production is the physiological and acoustic process by which humans generate sound through the vocal folds in the larynx, modulating airflow and pressure to create speech and other vocalizations. It encompasses the study of the mechanics of phonation, resonance, and articulation in the context of communication and performance.

lightbulbAbout this topic

Key research themes

1. How can speech synthesis systems be adapted to support both speech and singing voice production from neutral speech corpora?

This research area focuses on developing text-to-speech (TTS) frameworks that extend beyond conventional speech synthesis to incorporate singing voice production without requiring dedicated singing databases. The motivation lies in the cost, feasibility, and flexibility challenges of recording supplementary singing corpora, especially when the original speaker is unavailable or unable to sing well. The key insight is integrating speech-to-singing (STS) conversion within unit selection or corpus-based TTS systems using neutral speech databases, enabling synthesis of expressive vocal outputs for applications like storytelling, assistive devices, and immersive experiences.

A unit selection text-to-speech-and-singing synthesis framework from neutral speech: proof of concept

by Joan Claudi Socoró Carrié

2023, EURASIP Journal on Audio, Speech, and Music Processing

Key finding: Introduced a unit selection-based TTS and singing (US-TTS&S) framework that integrates speech-to-singing conversion to generate both speech and singing from a single neutral speech corpus. The system was validated objectively... Read more

articleView Paper downloadDownload

Limited domain synthesis of expressive military speech for animated characters

by Lewis Johnson

2024, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002.

Key finding: Developed an expressive speech synthesizer tailored for military training applications using corpus-based concatenative synthesis with samples classified by speaking style. The system exhibited versatile, high-quality... Read more

articleView Paper downloadDownload

Design and Development of a Text-To-Speech Synthesizer System

by VINEET CHAUHAN CHAUHAN

2023

Key finding: Provided an overview and design of TTS synthesizers using concatenative and formant synthesis approaches, highlighting unit selection and diphone synthesis. Emphasized the trade-offs between database size, naturalness, and... Read more

articleView Paper downloadDownload

Integrating a Voice Analysis-Synthesis System with a TTS Framework for Controlling Affect and Speaker Identity

by Ailbhe Chasaide

2025

Key finding: Integrated GlórCáil voice analysis-synthesis system into a DNN-based TTS framework to manipulate glottal source and vocal tract parameters globally, enabling control over speaker identity (gender, age) and affective coloring.... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What computational and vocal models facilitate the control and learning of expressive vocal intonation and prosody, including for language learning and voice training?

This theme explores computational synthesis techniques and interactive training methods designed to improve vocal expressiveness, particularly intonation patterns and prosodic features critical for natural speech and singing. The focus includes how speech synthesis models are manipulated for expressive control and how novel interfaces support second language (L2) speakers in mastering challenging intonation, as well as models for professional voice training to optimize vocal and prosodic quality. These approaches provide actionable methods for enhancing voice performance through controlled vocal synthesis and targeted training.

Performative Vocal Synthesis for Foreign Language Intonation Practice

by Christophe d'Alessandro

2024, Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

Key finding: Demonstrated that real-time hand-gesture controlled vocal synthesis (Performative Vocal Synthesis, PVS) enables L2 learners (French speakers learning English intonation) to produce more comprehensible categorical intonation... Read more

articleView Paper downloadDownload

A Training Model for Improving Journalists' Voice

by Emma Rodero

2024, Journal of Voice

Key finding: Designed and experimentally validated a vocal training program to improve vocal and prosodic elements (breathing, articulation, loudness, pitch, jitter, speech rate, pauses, stress) in journalism students. Post-training... Read more

articleView Paper downloadDownload

Integrating a Voice Analysis-Synthesis System with a TTS Framework for Controlling Affect and Speaker Identity

by Ailbhe Chasaide

2025

Key finding: Provided a computational framework for manipulating glottal and vocal tract parameters to generate variations in affective expression and speaker identity within synthetic speech, revealing that global parameter shifts can... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can physical and computational models of vocal fold physiology and acoustics improve understanding and simulation of voice production?

This theme surveys synthetic vocal fold models and numerical approaches that accurately represent the biomechanics and aerodynamics of phonation to better simulate human voice production. It includes the design of self-oscillating vocal fold models, quantification of vocal fold geometry, and stabilized finite element methods for wave equations in moving vocal tracts. These advances help elucidate the complex coupling of tissue vibration, airflow, and acoustics, yielding insights for synthesis, voice therapy, and model-based voice production research.

Synthetic, Self-Oscillating Vocal Fold Models for Voice Production Research

by Scott Thomson

2024, Journal of the Acoustical Society of America

Key finding: Provided a comprehensive review of two principal classes of synthetic self-oscillating vocal fold models—membranous (e.g., water-filled latex tubes) and elastic solid (e.g., multi-layered ultrasoft silicone)—detailing their... Read more

articleView Paper downloadDownload

Quantification of porcine vocal fold geometry

by Scott Thomson

2019, Journal of Voice

Key finding: Quantified 3D medial surface geometry of porcine vocal folds using microCT before and after freezing, finding ~5% non-uniform expansion due to freezing. Demonstrated qualitative similarity of porcine vocal fold geometry to... Read more

articleView Paper downloadDownload

A Stabilized Finite Element Method for the Mixed Wave Equation in an ALE Framework With Application to Diphthong Production

by hector espinoza

2024, Acta Acustica united with Acustica

Key finding: Proposed a subgrid scale stabilized finite element method (FEM) to solve the mixed form wave equation within an arbitrary Lagrangian-Eulerian (ALE) framework, addressing inf-sup compatibility and high-frequency oscillations... Read more

articleView Paper downloadDownload

Relaxation to one-dimensional postglottal flow in a vocal fold model

by Denisse Sciamarella

2022, Speech Communication

Key finding: Analyzed how inclusion of a finite relaxation length for the flow to transition to one-dimensionality downstream of the glottis affects low-order vocal fold voice production models. Demonstrated that shorter relaxation... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Voice Production

Toward a unified theory of voice production and perception

by Bruce Gerratt

At present, two important questions about voice remain unanswered: When voice quality changes, what physiological alteration caused this change, and if a change to the voice production system occurs, what change in perceived quality can... more

Figure 2: The four-parameter source spectral model, fitted to the spectrum of a natural voice. The voice source was estimated via invers filtering, and its spectrum was then calculated via fast Fourier transform. Differences in the amplitudes of individual harmonics are altered so that they conform to the slope of the appropriate model segment. 4. ADDITIONAL EVIDENCE FOR THE PSYCHOACOUSTIC MODEL phonation contrast in White Hmong, which has tones characterized by differences in both f, and phonation type (breathy, modal, and creaky). Closed quotient was a good predictor of H1*-H2* (r = -0.6, p < .05), which in turn reliably distinguished breathy voice from modal and creaky voice.

Figure 3: The two-layer cover-body vocal fold model used in Zhang et al. (2013).

Table 1: Components of the psychoacoustic model of voice quality and associated voice synthesis parameters.

Table 2: The ratio of listener sensitivity (JND) to parameter variability across speakers, for the four source model parameters. Data from Kreiman et al. (in preparation).

descriptionView Paper arrow_downwardDownload

Laboratory Task - Voice Analysis

by Iveta Jamrozová

2016

Import 03/11/2016Hlasová analýza může být použita k mnoha účelům, například k diagnostice nebo k prevenci poškození hlasu. Hlas může být snadno poškozen zejména jeho nadměrným nebo nesprávným používáním. V této práci jsou popsány základy... more

descriptionView Paper arrow_downwardDownload

Influence of asymmetric stiffness on the structural and aerodynamic response of synthetic vocal fold models

by B. Pickup and

2009, Journal of Biomechanics

The influence of asymmetric vocal fold stiffness on voice production was evaluated using life-sized, selfoscillating vocal fold models with an idealized geometry based on the human vocal folds. The models were fabricated using flexible,... more

descriptionView Paper arrow_downwardDownload

Role of vortices in voice production: Normal versus asymmetric tension

by Randal Paniello

2009, The Laryngoscope

Objectives: Decreasing the closing speed of the vocal folds can reduce loudness and energy in the higher frequency harmonics, resulting in reduced voice quality. Our aim was to study the correlation between higher frequencies and the... more

descriptionView Paper arrow_downwardDownload

An Anthropometric Analysis of the Head and Face in Vocal Students

by Katarzyna Mehr

2013

The aim of the study was an anthropometric analysis of the values of selected cranial and facial indexes in vocal students and a comparison of these values with the standards for the same ethnic and age group of non-singing students.... more

descriptionView Paper arrow_downwardDownload

Posterior cricoarytenoid muscle dynamics in canines and humans

by Juergen Neubauer

2014, The Laryngoscope

Objectives/Hypothesis: The posterior cricoarytenoid (PCA) muscle is the sole abductor of the glottis and serves important functions during respiration, phonation, cough, and sniff. The present study examines vocal fold abduction dynamics... more

descriptionView Paper arrow_downwardDownload

Effects of asymmetric superior laryngeal nerve stimulation on glottic posture, acoustics, vibration

by Juergen Neubauer

2013, The Laryngoscope

Objectives/Hypothesis: Evaluate the effects of asymmetric superior laryngeal nerve stimulation on the vibratory phase, laryngeal posture, and acoustics.

descriptionView Paper arrow_downwardDownload

An Anthropometric Analysis of the Head and Face in Vocal Students

by Kowalkowska Iwona

2013

descriptionView Paper arrow_downwardDownload

Vocal-fold collision mass as a differentiator between registers in the low-pitch range

by Erkki Vilkman

1995, Journal of Voice

Register shift between the chest and falsetto register is generally studied in the higher-than-speaking pitch range. However, a similar difference can also be produced at speaking pitch level. The shift from breathy "falsetto" phonation... more

FIG. 3. An unsuccessful start of the chest register phonation of a female subject. From top to bottom: acoustic signal, glottal flow waveform, and the electroglottography (EGG) signal. The negative direction of the EGG stands for increasing vocal-fold contact. a: General view on the attempt. A few glottal cycles represent falsetto on the lefthand side and a new period of falsetto quality after the diplophonic phase and, finally, chest register phonation on the righthand side. Time scale is 11.7 ms/div. b: Details of the diplophonic period of the unsuccessful start of the chest register phonation. Time scale is 3.3 ms/div. N.B.: The baseline of the glottal flow waveform is arbitrary.

FIG. 6. A cycle-to-cycle analysis of fundamental frequency (F>), open quotient (OQ), closing quotient (CIQ), speed quotient (SQ), and sound pressure level (SPL) measurements before and after the falsetto chest register shift for a male subject. The dashed line indicates the moment of the register transition. As to the background of the primo passaggio of females and the secondo passaggio of males, which lie at the same pitch area for both sexes, we have recently studied the characteristics of vocal-fold length changes with rising pitch in singing. It was found that the vocal-fold strain percentage was of the same order of magnitude for female and male subjects in the common register shift area (2). The register shift in this pitch area may again, be ex- plained by the critical mass hypothesis. The vertical thickness of the glottis reduces and the stiffness of the vocal-fold mucosa increases with lengthening of the vocal folds (growing strain). The register break occurs when the mucosal stiffness grows and/or the thyroarytenoid muscle can no longer maintain the vertical thickness of the glottis. Consequently, there are no possibilities for the critical mass for- mation, i.e., there is not a sufficient mucosal mass participating in the vibration to produce a strong enough collision for the chest register phonation. It has been reported that trained singers use shorter vocal folds or smaller strain for a given pitch than

TABLE 1. Short-time average values of speed quotient (SQ), open quotient (OQ), closing quotient (CIQ), fundamental frequency (Fy) and sound pressure level (SPL) before and after the falsetto chest register shift vocal folds during chest voice phonation (c.f. 20,21) is one sign of the tissue deformation caused by the collision. Our results concerning the differences between the glottal flow waveform in the falsetto and the chest register phonation in the low-pitch area might be explained on the basis of the concept presented by Titze (7) that the force with which the vocal processes are pressed together determines the shape of the flow glottogram and that the sharpness of the negative amplitude of its derivative correlates to the perceptual differences between the two reg- isters. According to Titze (7); this point of discon- tinuity in the derivative gives rise to the perception of a chest phonation rich in higher harmonics as compared with the falsetto voice and can be seen *‘whenever the vocal processes are in contact.”

descriptionView Paper arrow_downwardDownload

Revisiting the two-mass model of the vocal folds

by Florencia Assaneo

2013, Papers in Physics

Realistic mathematical modeling of voice production has been recently boosted by applications to different fields like bioprosthetics, quality speech synthesis and pathological diagnosis. In this work, we revisit a two-mass model of the... more

descriptionView Paper arrow_downwardDownload

Influence of subglottic stenosis on the flow-induced vibration of a computational vocal fold model

by Scott Thomson

2013, Journal of Fluids and Structures

The effect of subglottic stenosis on vocal fold vibration is investigated. An idealized stenosis is defined, parameterized, and incorporated into a two-dimensional, fully coupled finite element model of the vocal folds and laryngeal... more

descriptionView Paper arrow_downwardDownload

Voice Production

Key research themes

1. How can speech synthesis systems be adapted to support both speech and singing voice production from neutral speech corpora?

2. What computational and vocal models facilitate the control and learning of expressive vocal intonation and prosody, including for language learning and voice training?

3. How can physical and computational models of vocal fold physiology and acoustics improve understanding and simulation of voice production?

Related Topics

All papers in Voice Production