Academia.eduAcademia.edu

Speaker Characterization

description8 papers
group11 followers
lightbulbAbout this topic
Speaker characterization is the analytical process of identifying and describing the distinctive traits, attributes, and vocal qualities of a speaker in spoken discourse. This field examines aspects such as tone, pitch, accent, and speech patterns to understand the speaker's identity, intentions, and emotional state within a communicative context.
lightbulbAbout this topic
Speaker characterization is the analytical process of identifying and describing the distinctive traits, attributes, and vocal qualities of a speaker in spoken discourse. This field examines aspects such as tone, pitch, accent, and speech patterns to understand the speaker's identity, intentions, and emotional state within a communicative context.

Key research themes

1. How do voice quality variations influence the perceived personality traits and charisma of a speaker?

This research area investigates how different laryngeal and supralaryngeal voice qualities produced by the same individual affect listeners’ perceptions of that speaker’s personality traits and charisma. It matters because voice quality conveys social and emotional cues crucial for interpersonal communication, speaker profiling, and forensic applications.

Key finding: This study found that voice quality variations, including modal, creaky, breathy (natural and artificial), nasalization, and smiling, produced by the same speakers significantly impacted listener ratings on personality traits... Read more
Key finding: Listeners showed generally low accuracy (~33%, chance level) in judging speakers’ personality traits from speech alone, although some traits like Aggression and Social Potency had slightly higher recognition rates.... Read more
Key finding: This paper identifies fundamental frequency (f0) measures that best correlate with perceived speaker charisma, finding that mean f0 is the most effective pitch-level metric while kurtosis and 80-percentile f0 range optimally... Read more

2. What acoustic and phonetic features capture speaker-specific variability in spontaneous and controlled speech for speaker characterization and recognition?

This research theme focuses on identifying phonetic, acoustic, and articulatory features that characterize speaker individuality across different speech styles (read and spontaneous) and linguistic contexts. This theme is vital for improving speaker recognition systems, forensic voice comparison, and understanding within-speaker variability versus between-speaker differences.

Key finding: The study extended prior findings from read speech to spontaneous speech for the same 99/100 talkers and showed that acoustic voice spaces remain highly similar across speaking styles, with fundamental frequency variability... Read more
Key finding: Using a large corpus of Japanese vowels produced in varied phonetic contexts, this study demonstrated that coarticulation affects lower-formant related sub-bands more strongly, whereas speaker effects dominate higher-formant... Read more
Key finding: This presentation summarized exploratory analysis on the complex interaction between speaker differences and phonetic context using cepstral measures, highlighting the importance of quantifying relative contributions of... Read more
Key finding: This paper investigated the phoneme distributions within Gaussian Mixture Model (GMM) clusters representing speakers, revealing that certain phonetic segments contribute disproportionately to speaker modeling efficacy. The... Read more

3. How can speaker demographic traits such as age, height, and physiognomic factors be automatically estimated from speech using i-vector frameworks and machine learning?

This theme investigates computational methods, especially i-vector representations combined with regression and classification models, to infer speaker profile traits like age and height from speech. These traits offer valuable auxiliary information in forensic cases, user profiling, and personalized human-computer interaction systems. Understanding effectiveness, limitations, and variability factors improves model design and forensic applicability.

Key finding: The thesis developed novel approaches for estimating speaker age, height, weight, and smoking habits from spontaneous telephone speech using i-vector and Non-negative Factor Analysis (NFA) frameworks combined with Artificial... Read more
Key finding: The study proposed an age estimation method leveraging i-vectors and Within-Class Covariance Normalization, followed by Least Squares Support Vector Regression, achieving lower mean absolute error and higher correlation with... Read more
Key finding: This paper presented an automatic speaker height estimation approach using i-vectors and regression models (ANN and LSSVR), yielding effective height predictions on the NIST 2008 and 2010 SRE corpora. This contributes to the... Read more

All papers in Speaker Characterization

This study aimed to analyze the impact of the amount of data on the discriminatory performance of acoustic-phonetic parameters, some of which are frequently assessed in forensic speaker comparisons. Parameters from three distinct phonetic... more
Objectives. To assess the speaker-discriminatory potential of a set of fundamental frequency estimates in intraidentical twin pair comparisons and cross-pair comparisons (i.e., among all speakers). Participants. A total of 20 Brazilian... more
In forensic voice comparison, it is strongly recommended to follow the Bayesian paradigm to present a forensic evidence to the court. In this paradigm, the strength of the forensic evidence is summarized by a likelihood ratio (LR). But in... more
This study investigated the production of six Cantonese tones by heritage language (HL) children in Vancouver, Canada. Twenty-five Cantonese heritage speakers (HSs) aged between 2;1 and 6;0 participated in the production experiment. Data... more
Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker... more
Support vector machine (SVM) has been proven as a powerful tool for solving age and gender classification problems. However, SVM is sensitive to noise and outliers. In this paper we propose a new fuzzy SVM based on an assumption that... more
Support vector machine (SVM) has been proven as a powerful tool for solving age and gender classification problems. However, SVM is sensitive to noise and outliers. In this paper we propose a new fuzzy SVM based on an assumption that... more
Support vector machine (SVM) has been proven as a powerful tool for solving age and gender classification problems. However, SVM is sensitive to noise and outliers. In this paper we propose a new fuzzy SVM based on an assumption that... more
Support vector machine (SVM) has been proven as a powerful tool for solving age and gender classification problems. However, SVM is sensitive to noise and outliers. In this paper we propose a new fuzzy SVM based on an assumption that... more
Support vector machine (SVM) has been proven as a powerful tool for solving age and gender classification problems. However, SVM is sensitive to noise and outliers. In this paper we propose a new fuzzy SVM based on an assumption that... more
Support vector machine (SVM) has been proven as a powerful tool for solving age and gender classification problems. However, SVM is sensitive to noise and outliers. In this paper we propose a new fuzzy SVM based on an assumption that... more
The objective of the proposed work is to analyze and study the use of i-vectors for Anomalous Detection of Sounds (ADS) in Machines. I-vectors, to the best of our knowledge, have not been studied for machine sounds. We will be using the... more
Support vector machine (SVM) has been proven as a powerful tool for solving age and gender classification problems. However, SVM is sensitive to noise and outliers. In this paper we propose a new fuzzy SVM based on an assumption that... more
In the provision of linguistic evidence as one of the foci in Forensic Linguistics, Forensic Speaker Verification (FSV) includes an analysis of speech recordings to verify the voice of a criminal. As an inquiry into the validity of the... more
This paper demonstrates the potential of the sub-band parametric cepstral distance (PCD) formulated by Clermont and Mokhtari (1994), as an alternative to formants in acoustic phonetic research. As a cepstrum-based measure, the PCD is... more
Purpose Previous studies showed both early and late acquisition of Cantonese tones based on transcription data using different criteria, but very little acoustic data were reported. Our study examined Cantonese tone acquisition using both... more
Recent advances in the field of speaker recognition have proved to highly outperform algorithms. However this performance degrades when limited data are presented. This paper presents examples on how SVM can improve speaker recognition.... more
Current Automatic Speaker Recognition (ASR) System has emerged as an important medium of confirmation of identity in many businesses, ecommerce applications, forensics and law enforcement as well. Specialists trained in criminological... more
Speech under face cover constitute a case that is increasingly met by forensic speech experts. Wearing face cover mostly happens when an individual strives to conceal his or her identity. Based on the material of face cover and the level... more
Most of the existing literature on i-vector-based speaker recognition focuses on recognition problems, where i-vectors are extracted from speech recordings of sufficient length. The majority of modeling/recognition techniques therefore... more
An acoustic-phonetic forensic-voice-comparison system extracted information from the formant trajectories of tokens of Standard Chinese /iau/. When this information was added to a generic automatic forensic-voice-comparison system, which... more
An acoustic-phonetic forensic-voice-comparison system extracted information from the formant trajectories of tokens of Standard Chinese /iau/. When this information was added to a generic automatic forensic-voice-comparison system, which... more
Examples are given of forensic voice comparison with higher level features in real-world cases and research. A pilot experiment relating to estimation of strength of evidence in forensic voice comparison is described which explores the... more
This paper proposes a new speaker age estimation method that uses an age-dependent insensitive loss. Most conventional speaker age estimation frameworks ignore the ambiguity of a perceptual speaker age. These “over-sensitive” frameworks... more
We propose two machine learning improvements on the existing architecture of voiceand speakerrecognition software. Where conventional systems extract two kinds of frequency data from voice recordings and use the concatenation as input, we... more
Voice controlled applications can be a great aid to society, especially for physically challenged people. However this requires robustness to all kinds of variations in speech. A spoken language understanding system that learns from... more
Audio classification has applications in a variety of contexts, such as automatic sound analysis, supervised audio segmentation and in audio information search and retrieval. Extended Baum-Welch (EBW) transformations are most commonly... more
Speech signals convey important paralinguistic information such as age, gender, body size, language, accent and emotional state of speakers. Automatic identification of speaker traits and states has a wide range of forensic, commercial... more
Voice recognition systems are used to distinguish different sorts of voices. However, recognizing a voice is not always successful due to the presence of different parameters. Hence, there is a need to create a set of estimation criteria... more
Speech-based communication is one of the most preferred modes of communication for humans. The human voice contains several important information and clues that help in interpreting the voice message. The gender of the speaker can be... more
This paper proposes an automatic smoking habit detection from spontaneous telephone speech signals. In this method, each utterance is modeled using i-vector and non-negative factor analysis (NFA) frameworks, which yield low-dimensional... more
This paper proposes an automatic smoking habit detection from spontaneous telephone speech signals. In this method, each utterance is modeled using i-vector and non-negative factor analysis (NFA) frameworks, which yield low-dimensional... more
This project presents an approach to classify speakers on the basis of their age and gender. Short term features and long term features have been extracted from the voice sample of each speaker. These have been used to train Support... more
This paper focuses on the automatic detection of a person's blood level alcohol based on automatic speech processing approaches. We compare 5 different feature types with different ways of modeling. Experiments are based on the ALC corpus... more
Autism Spectrum Disorder (ASD) is on the rise and constantly growing. Earlier identify of ASD with the best outcome will allow someone to be safe and healthy by proper nursing. Humans can hardly estimate the present condition and stage of... more
This work presents an investigation on how to define Neural Networks (NN) architectures adopting a data-driven approach using clustering to create sub-labels to facilitate the learning process and to discover the number of neurons needed... more
Fundamental frequency has been used for a long time in speaker identification (Braun, 1995; Rose, 2003). The within-speaker variation in F0 is affected by several factors. In Braun (1995), they are categorized as technical, physiological... more
In automatic speech recognition (ASR) the non-linear data projection provided by a one hidden layer multilayer perceptron (MLP), trained to recognise phonemes, has previously been shown to provide feature enhancement which can... more
We investigate the problem of predicting the quality of automatic speech recognition (ASR) output under the following rigid constraints: i) reference transcriptions are not available, ii) confidence information about the system that... more
In this paper, we propose an integration of random subspace sampling and Fishervoice for speaker verification. In the previous random sampling framework [1], we randomly sample the JFA feature space into a set of low-dimensional... more
Human body characteristics such as fingerprints, retinas and irises, facial struc-ture, and voice recognition are just some of the many biometric fields being researched today. These characteristics are unique to each individual, then... more
The performance of speaker recognition system is highly dependent on the amount of speech used in enrollment and test. This work presents a detailed experimental review and analysis of the GMM-SVM based speaker recognition system in... more
This paper presents statistical data for the fundamental frequency of 100 young male speakers of Standard Southern British English producing spontaneous speech under cognitive stress. The material comes from the new DyViS database, for... more
Congenital amusia is a neurogenetic disorder affecting musical pitch processing. It also affects lexical tone perception. It is well documented that noisy conditions impact speech perception in second language learners and cochlear... more
The performance of speaker recognition system is highly dependent on the amount of speech used in enrollment and test. This work presents a detailed experimental review and analysis of the GMM-SVM based speaker recognition system in... more
Important problems in speech soft biometrics include the prediction of speaker's age or gender. Here, the aforementioned problems are addressed in the context of utterances collected during a long time period. A unified framework for age... more
One of the world's chronic neuro-degenerative diseases, Alzheimer's Disease (AD), leads its sufferers, among other symptoms, to suffer from speech difficulties. In particular, the inability to recall vocabulary which makes patients'... more
The aim of automatic pathological voice detection systems is to serve as tools, to medical specialists, for a more objective, less invasive and improved diagnosis of diseases. In this respect, the gold standard for those systern^ include... more
Automatic detection of a baby cry in audio signals is an essential step in applications such as remote baby monitoring. It is also important for researchers, who study the relation between baby cry patterns and various health or... more
ASR performance, for current systems, degrades dramatically when there is a mismatch between the training and testing conditions, for instance due to the presence of other sound sources (Lippmann 1997). However, many potential... more
Download research papers for free!