Speaker Characterization

description8 papers

group11 followers

lightbulbAbout this topic

Speaker characterization is the analytical process of identifying and describing the distinctive traits, attributes, and vocal qualities of a speaker in spoken discourse. This field examines aspects such as tone, pitch, accent, and speech patterns to understand the speaker's identity, intentions, and emotional state within a communicative context.

lightbulbAbout this topic

Key research themes

1. How do voice quality variations influence the perceived personality traits and charisma of a speaker?

This research area investigates how different laryngeal and supralaryngeal voice qualities produced by the same individual affect listeners’ perceptions of that speaker’s personality traits and charisma. It matters because voice quality conveys social and emotional cues crucial for interpersonal communication, speaker profiling, and forensic applications.

The effects of different voice qualities on the perceived personality of a speaker

by Sara Pearsell

2024, Frontiers in Communication

Key finding: This study found that voice quality variations, including modal, creaky, breathy (natural and artificial), nasalization, and smiling, produced by the same speakers significantly impacted listener ratings on personality traits... Read more

articleView Paper downloadDownload

Speech-based perception of speaker traits (Welch et al., 2021)

by Brett Welch

2022

Key finding: Listeners showed generally low accuracy (~33%, chance level) in judging speakers’ personality traits from speech alone, although some traits like Aggression and Social Potency had slightly higher recognition rates.... Read more

articleView Paper downloadDownload

Measuring a speaker's acoustic correlates of pitch - but which? A contrastive analysis for perceived speaker charisma

by Radek Skarnitzl

2021

Key finding: This paper identifies fundamental frequency (f0) measures that best correlate with perceived speaker charisma, finding that mean f0 is the most effective pitch-level metric while kurtosis and 80-percentile f0 range optimally... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What acoustic and phonetic features capture speaker-specific variability in spontaneous and controlled speech for speaker characterization and recognition?

This research theme focuses on identifying phonetic, acoustic, and articulatory features that characterize speaker individuality across different speech styles (read and spontaneous) and linguistic contexts. This theme is vital for improving speaker recognition systems, forensic voice comparison, and understanding within-speaker variability versus between-speaker differences.

Acoustic voice variation in spontaneous speech

by Cynthia Lee

2024, The Journal of the Acoustical Society of America

Key finding: The study extended prior findings from read speech to spontaneous speech for the same 99/100 talkers and showed that acoustic voice spaces remain highly similar across speaking styles, with fundamental frequency variability... Read more

articleView Paper downloadDownload

Analysis of speaker and co-articulation effects based on sub-band cepstral variances in the Japanese vowels of 300 male speakers

by Dr Frantz Clermont

2020, 14th Biennial Conference of the International Association of Forensic Linguistics (IAFL), Melbourne, Australia

Key finding: Using a large corpus of Japanese vowels produced in varied phonetic contexts, this study demonstrated that coarticulation affects lower-formant related sub-bands more strongly, whereas speaker effects dominate higher-formant... Read more

articleView Paper downloadDownload

"Analysis of speaker and coarticulation effects based on the sub-band cepstral variances in the Japanese vowels of 300 male speakers": ORAL presentation

by Dr Frantz Clermont

2020, 14th Biennial Conference of the International Association of Forensic Phonetics (IAFL), Melbourne

Key finding: This presentation summarized exploratory analysis on the complex interaction between speaker differences and phonetic context using cepstral measures, highlighting the importance of quantifying relative contributions of... Read more

articleView Paper downloadDownload

Phonetic Analysis of GMM-based Speaker Models

by Margit Antal

2023, Proceedings of “Verificatori Biometrici” Workshop, organized by Technical University of Cluj-Napoca, Universitas Napocensis Babes-Bolyai, Universitas Medicinae et Farmaciae Napocensis and CNCSIS, Cluj-Napoca, Romania, May

Key finding: This paper investigated the phoneme distributions within Gaussian Mixture Model (GMM) clusters representing speakers, revealing that certain phonetic segments contribute disproportionately to speaker modeling efficacy. The... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can speaker demographic traits such as age, height, and physiognomic factors be automatically estimated from speech using i-vector frameworks and machine learning?

This theme investigates computational methods, especially i-vector representations combined with regression and classification models, to infer speaker profile traits like age and height from speech. These traits offer valuable auxiliary information in forensic cases, user profiling, and personalized human-computer interaction systems. Understanding effectiveness, limitations, and variability factors improves model design and forensic applicability.

Speaker Profiling for Forensic Applications

by Amir Hossein Poorjam

2022

Key finding: The thesis developed novel approaches for estimating speaker age, height, weight, and smoking habits from spontaneous telephone speech using i-vector and Non-negative Factor Analysis (NFA) frameworks combined with Artificial... Read more

articleView Paper downloadDownload

Speaker age estimation using i-vectors

by David Van Leeuwen

2015, Engineering Applications of Artificial Intelligence

Key finding: The study proposed an age estimation method leveraging i-vectors and Within-Class Covariance Normalization, followed by Least Squares Support Vector Regression, achieving lower mean absolute error and higher correlation with... Read more

articleView Paper downloadDownload

Height Estimation from Speech Signals using i-vectors and Least-Squares Support Vector Regression

by Amir Hossein Poorjam and

2014, In Proceeding of the 37th International Conference on Telecommunications and Signal Processing, Germany

Key finding: This paper presented an automatic speaker height estimation approach using i-vectors and regression models (ANN and LSSVR), yielding effective height predictions on the NIST 2008 and 2010 SRE corpora. This contributes to the... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Speaker Characterization

Measuring the impact of data size on the speaker discriminatory performance: a spontaneous speech-based study

by Julio Cesar Cavalcanti

2024, 13th Nordic Prosody Conference: Applied and Multimodal Prosody Research

This study aimed to analyze the impact of the amount of data on the discriminatory performance of acoustic-phonetic parameters, some of which are frequently assessed in forensic speaker comparisons. Parameters from three distinct phonetic domains were considered, namely, spectral, melodic, and temporal, which were assessed separately within the same phonetic domain and in combination. The speech material consisted of spontaneous telephone conversations between two subjects. During the recording sessions, the participants were placed in different rooms, not directly seeing, hearing, or interacting with each other. The speakers were encouraged to start a conversation using a mobile phone while being simultaneously recorded. All recordings were carried out with a high resolution (44.1 kHz and 16-bit). Data segmentation and transcription were performed in the Praat software [1]. The participants were 20 male subjects, Brazilian Portuguese speakers from the same dialectal area. Their age ranged from 19 to 35 years, with a mean of 26.4 years. Although the subjects (10 identical twin pairs) were recruited from a twin research project, cf. [2, 3, 4], the focus here was comparisons among all speakers (i.e., 190 inter-speaker comparisons) rather than on individual twin pairs. Two metrics of discriminatory performance were examined through the R software [5] as a function of the comparisons among all speakers in the study using the script fvclrr [6]: Log-likelihoodratio-cost (Cllr) and Equal Error Rate (EER) values. The Cllr metric is an empirical estimate of the precision of likelihood ratios. The EER metric captures the point where the false reject rate (type I error) and false accept rate (type II error) are equal and is used to describe the overall accuracy of a system. Lower Cllr and EER values are compatible with better discriminatory performance, whereas higher values suggest the opposite trend. A cross-validation procedure was adopted for the calculation of likelihood ratios, where multiple pairwise comparisons were performed across individuals in which the background sample consisted of data from all speakers, except those being directly compared. To assess the impact of the amount of data on discriminatory performance, a very straightforward approach was adopted. Based on a larger data set extracted from recordings of about 2.50 min per speaker, random data points were additively selected for the analyses. The minimum number of data points, i.e., acoustic measurements, selected per speaker was set at 2 to allow the minimum comparison of intra-speaker variability. Thereafter, more randomly selected data points were progressively added to the tests, two points at a time, and new Cllr and EER values were computed for the new resampled data set. For the present study, the maximum number of data points (acoustic measurements) was set at 30. Given the nature of the speech material assessed, a discrepancy in the number of samples produced per subject was observed. Because of that, a random downsampling procedure was repeated 200 times to minimize the selection bias using the R package Recipes [7]. Cllr and EER median values were reported after performing tests with the randomly selected data while performing a fusion and calibration of different estimates based on a logistic regression technique, cf. [8]. Melodic and temporal parameters were extracted from speech chunks with an average (mean and median) temporal window of 3 s, corresponding to inter-pause intervals in most cases. Spectral parameters were extracted from the midpoints of /a/ vowels in stressed and unstressed positions. The monophthongs displayed a mean and a median duration of 67 ms and 84 ms, respectively. After a manual segmentation, all parameters were extracted automatically using a Praat script [9]. Four models were compared. Model 1 (M1) comprises the combination of melodic parameters (f0 median and f0 base value). Model 2 (M2) comprises the combination of temporal parameters (speech rate and articulation rate). Model 3 (M3) consists of the combination of spectral parameters (F3 and F4). Model 4 (M4) considered the combination of all the acoustic-phonetic parameters.

descriptionView Paper arrow_downwardDownload

Multiparametric Analysis of Speaking Fundamental Frequency in Genetically Related Speakers Using Different Speech Materials: Some Forensic Implications

by Julio Cesar Cavalcanti

2024, Journal of Voice

Objectives. To assess the speaker-discriminatory potential of a set of fundamental frequency estimates in intraidentical twin pair comparisons and cross-pair comparisons (i.e., among all speakers). Participants. A total of 20 Brazilian... more

descriptionView Paper arrow_downwardDownload

Homogeneity Measure for Forensic Voice Comparison: A Step Forward Reliability

by Itshak Lapidot

2024, Lecture Notes in Computer Science

In forensic voice comparison, it is strongly recommended to follow the Bayesian paradigm to present a forensic evidence to the court. In this paradigm, the strength of the forensic evidence is summarized by a likelihood ratio (LR). But in... more

descriptionView Paper arrow_downwardDownload

A preliminary study on Cantonese tone production by young heritage speakers

by Peggy Mok

2023

This study investigated the production of six Cantonese tones by heritage language (HL) children in Vancouver, Canada. Twenty-five Cantonese heritage speakers (HSs) aged between 2;1 and 6;0 participated in the production experiment. Data... more

descriptionView Paper arrow_downwardDownload

A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech

by Deepu Vijayasenan

2023, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker... more

Fig. 1. Block diagram of deep neural network architecture for joint prediction of height and age of a speaker from speech

Fig. 2. RMSE of height prediction using different lengths of speech data of both male and female speakers

Fig. 3. RMSE of age prediction using different lengths of speech data of both male and female speakers

Table 1. RMSE values of baseline height and age estimation algo- rithms Table 2. RMSE values from DNN model for segment wise and com- plete duration prediction

Table 3. RMSE values of test speakers for different bins values do not change as much as in the case of height prediction. We hypothesize that height prediction can be further improved with a more uniform training data distribution.

descriptionView Paper arrow_downwardDownload

Fuzzy support vector machines for age and gender classification

by Phuoc Nguyen

2023, Interspeech 2010

Support vector machine (SVM) has been proven as a powerful tool for solving age and gender classification problems. However, SVM is sensitive to noise and outliers. In this paper we propose a new fuzzy SVM based on an assumption that... more

descriptionView Paper arrow_downwardDownload

Fuzzy support vector machines for age and gender classification

by Phuoc Nguyen

2023, Interspeech 2010

descriptionView Paper arrow_downwardDownload

Fuzzy support vector machines for age and gender classification

by Phuoc Nguyen

2023, Interspeech 2010

descriptionView Paper arrow_downwardDownload

Fuzzy support vector machines for age and gender classification

by Dat Tran

2023, Interspeech 2010

descriptionView Paper arrow_downwardDownload

Fuzzy support vector machines for age and gender classification

by Dat Tran

2023, Interspeech 2010

descriptionView Paper arrow_downwardDownload

Fuzzy support vector machines for age and gender classification

by dat tran

2023, Interspeech 2010

descriptionView Paper arrow_downwardDownload

Classification of anomalous machine sounds using i-vectors

by Maham Tanveer

2023

The objective of the proposed work is to analyze and study the use of i-vectors for Anomalous Detection of Sounds (ADS) in Machines. I-vectors, to the best of our knowledge, have not been studied for machine sounds. We will be using the... more

descriptionView Paper arrow_downwardDownload

Fuzzy support vector machines for age and gender classification

by Phuoc Nguyen

2023, Interspeech 2010

descriptionView Paper arrow_downwardDownload

Forensic Linguistic Inquiry into the Validity of F0 as Discriminatory Potential in the System of Forensic Speaker Verification

by Deri Sis Nanda

2022

In the provision of linguistic evidence as one of the foci in Forensic Linguistics, Forensic Speaker Verification (FSV) includes an analysis of speech recordings to verify the voice of a criminal. As an inquiry into the validity of the... more

descriptionView Paper arrow_downwardDownload

Sub-band cepstral distance as an alternative to formants: Quantitative evidence from a forensic comparison experiment

by Dr Frantz Clermont

2022, Journal of Phonetics

This paper demonstrates the potential of the sub-band parametric cepstral distance (PCD) formulated by Clermont and Mokhtari (1994), as an alternative to formants in acoustic phonetic research. As a cepstrum-based measure, the PCD is... more

descriptionView Paper arrow_downwardDownload

Development of Phonetic Contrasts in Cantonese Tone Acquisition

by Peggy Mok

2022, Journal of Speech, Language, and Hearing Research

Purpose Previous studies showed both early and late acquisition of Cantonese tones based on transcription data using different criteria, but very little acoustic data were reported. Our study examined Cantonese tone acquisition using both... more

descriptionView Paper arrow_downwardDownload

Efficient Parameterization for Automatic Speaker Recognition Using Support Vector Machines

by Mondher Frikha

2022, Advances in Intelligent Systems and Computing

Recent advances in the field of speaker recognition have proved to highly outperform algorithms. However this performance degrades when limited data are presented. This paper presents examples on how SVM can improve speaker recognition.... more

descriptionView Paper arrow_downwardDownload

Evaluating and classifying restrictions and hydrogeomorphic hazards for sustainable urban development planning in dry areas (case study: Birjand, South Khorasan Province, Iran)

by mahdi saghafi

2022, Natural Hazards

Current Automatic Speaker Recognition (ASR) System has emerged as an important medium of confirmation of identity in many businesses, ecommerce applications, forensics and law enforcement as well. Specialists trained in criminological... more

descriptionView Paper arrow_downwardDownload

Speaker recognition for speech under face cover

by Hanna Karppelin

2022, Interspeech 2015

Speech under face cover constitute a case that is increasingly met by forensic speech experts. Wearing face cover mostly happens when an individual strives to conceal his or her identity. Based on the material of face cover and the level... more

descriptionView Paper arrow_downwardDownload

Incorporating Duration Information into I-vector-based Speaker-Recognition systems

by Jerneja Gros

2022

Most of the existing literature on i-vector-based speaker recognition focuses on recognition problems, where i-vectors are extracted from speech recordings of sufficient length. The majority of modeling/recognition techniques therefore... more

descriptionView Paper arrow_downwardDownload

Forensic voice comparison using Chinese/iau

by Tharmarajah Thiruvaran

2022

An acoustic-phonetic forensic-voice-comparison system extracted information from the formant trajectories of tokens of Standard Chinese /iau/. When this information was added to a generic automatic forensic-voice-comparison system, which... more

descriptionView Paper arrow_downwardDownload

Forensic voice comparison using Chinese/iau

by Tharmarajah Thiruvaran

2022

descriptionView Paper arrow_downwardDownload

Likelihood ratio-based forensic voice comparison with higher level features: Research and reality

by Phil Rose

2022, Computer Speech & Language

Examples are given of forensic voice comparison with higher level features in real-world cases and research. A pilot experiment relating to estimation of strength of evidence in forensic voice comparison is described which explores the... more

descriptionView Paper arrow_downwardDownload

Speaker Age Estimation Using Age-Dependent Insensitive Loss

by Naohiro Tawara

2022, 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

This paper proposes a new speaker age estimation method that uses an age-dependent insensitive loss. Most conventional speaker age estimation frameworks ignore the ambiguity of a perceptual speaker age. These “over-sensitive” frameworks... more

descriptionView Paper arrow_downwardDownload

Intelligent Rapid Voice Recognition using Neural Tensor Network , SVM and Reinforcement Learning

by Aashna Garg

2022

We propose two machine learning improvements on the existing architecture of voiceand speakerrecognition software. Where conventional systems extract two kinds of frequency data from voice recordings and use the concatenation as input, we... more

descriptionView Paper arrow_downwardDownload

Multitask Learning with Capsule Networks for Speech-to-Intent Applications

by Jakob Poncelet

2022

Voice controlled applications can be a great aid to society, especially for physically challenged people. However this requires robustness to all kinds of variations in speech. A spoken language understanding system that learns from... more

descriptionView Paper arrow_downwardDownload

Audio classification using extended baum-welch transformations

by Dimitri Kanevsky

2022

Audio classification has applications in a variety of contexts, such as automatic sound analysis, supervised audio segmentation and in audio information search and retrieval. Extended Baum-Welch (EBW) transformations are most commonly... more

descriptionView Paper arrow_downwardDownload

Speaker Profiling for Forensic Applications

by Amir Hossein Poorjam

2022

Speech signals convey important paralinguistic information such as age, gender, body size, language, accent and emotional state of speakers. Automatic identification of speaker traits and states has a wide range of forensic, commercial... more

descriptionView Paper arrow_downwardDownload

Voice Identity Finder Using the Back Propagation Algorithm of an Artificial Neural Network

by Mustafa El-halabi

2022, Procedia Computer Science

Voice recognition systems are used to distinguish different sorts of voices. However, recognizing a voice is not always successful due to the presence of different parameters. Hence, there is a need to create a set of estimation criteria... more

descriptionView Paper arrow_downwardDownload

Gender Detection from Human Voice Using Tensor Analysis

by Pradip K. Das

2022

Speech-based communication is one of the most preferred modes of communication for humans. The human voice contains several important information and clues that help in interpreting the voice message. The gender of the speaker can be... more

descriptionView Paper arrow_downwardDownload

Automatic Smoker Detection from Telephone Speech Signals

by Saeid Safavi

2022, Speech and Computer

This paper proposes an automatic smoking habit detection from spontaneous telephone speech signals. In this method, each utterance is modeled using i-vector and non-negative factor analysis (NFA) frameworks, which yield low-dimensional... more

descriptionView Paper arrow_downwardDownload

Automatic Smoker Detection from Telephone Speech Signals

by Soheila Hesaraki

2022

descriptionView Paper arrow_downwardDownload

Improved Speaker Age Group and Gender Detection using Multiple Classifiers

by Ajay Dagar

2022

This project presents an approach to classify speakers on the basis of their age and gender. Short term features and long term features have been extracted from the voice sample of each speaker. These have been used to train Support... more

descriptionView Paper arrow_downwardDownload

Drink and Speak: On the automatic classification of alcohol intoxication by acoustic, prosodic and text-based features

by Elmar Noeth

2022

This paper focuses on the automatic detection of a person's blood level alcohol based on automatic speech processing approaches. We compare 5 different feature types with different ways of modeling. Experiments are based on the ALC corpus... more

descriptionView Paper arrow_downwardDownload

Predicting Autism Spectrum Disorder Using Machine Learning Classifiers

by Koushik Chowdhury

2022, 2020 International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT)

Autism Spectrum Disorder (ASD) is on the rise and constantly growing. Earlier identify of ASD with the best outcome will allow someone to be safe and healthy by proper nursing. Humans can hardly estimate the present condition and stage of... more

Here, from figue 3, we can see AUC value of SVM Poly- nomial kernel is 0.832, SVM Gaussian Radial Basis kernel is 0.870 and SVM Sigmoid Kernel is 0.401. So here SVM Sigmoid kernel performance is very poor but Polynomial kernel and Gaussian Radial Basis kernel is performed well almost similar. Fig. 3. ROC Curve of SVM Kernels

2) Missing Value: The dataset contains so many missing values. Dealing with these missing values was the biggest challenge as we have 19 variables. There are a few ways to handle the missing values. For example, replace the missing values with averaged values or delete instances that have missing values. Since we have 19 variables, we decide to remove the instances with missing values. 1) Attributes: Our work uses publicly available standard data sets [4]. Score such as Al Score, A2_Score are the result of a questionnaire survey by the Autism Research Center at the University of Cambridge, UK [5]. The Dataset consists of nine individual characteristics and ten behavioral features. They are following.

1) Results: To compare | between classifiers we need proper measurement result of their accuracy rate, AUC value, Precision, Recall and Fl-Score. So they are..

1) SVM Kernel Results: We need to find which SVM kernel is performing the best with our dataset. So we have to compare their measurement result of accuracy rate, AUC value, Precision, Recall and Fl-Score. Here from the above

descriptionView Paper arrow_downwardDownload

Clustering for Data-driven Unraveling Artificial Neural Networks

by Felipe Farias

2022, Anais do Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2020)

This work presents an investigation on how to define Neural Networks (NN) architectures adopting a data-driven approach using clustering to create sub-labels to facilitate the learning process and to discover the number of neurons needed... more

descriptionView Paper arrow_downwardDownload

Preliminary F0 statistics and forensic phonetics

by Jonas Lindh

2022

Fundamental frequency has been used for a long time in speaker identification (Braun, 1995; Rose, 2003). The within-speaker variation in F0 is affected by several factors. In Braun (1995), they are categorized as technical, physiological... more

descriptionView Paper arrow_downwardDownload

MLP trained to separate problem speakers provides improved features for speaker identification

by Andrew Morris

2022, Proceedings 39th Annual 2005 International Carnahan Conference on Security Technology

In automatic speech recognition (ASR) the non-linear data projection provided by a one hidden layer multilayer perceptron (MLP), trained to recognise phonemes, has previously been shown to provide feature enhancement which can... more

descriptionView Paper arrow_downwardDownload

Multitask Learning for Adaptive Quality Estimation of Automatically Transcribed Utterances

by Hamed Zamani

2022, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We investigate the problem of predicting the quality of automatic speech recognition (ASR) output under the following rigid constraints: i) reference transcriptions are not available, ii) confidence information about the system that... more

Figure 1: Learning curves for the regression models evaluated on the four domains. The evaluation metric is MAE (|).

Figure 2: Learning curves for the classification models evaluated on four domains for WER scores with threshold at 0.05. Evaluation is calculated with balanced accuracy (7).

Figure 3: Domains divergence given by MMD (0 means similar and | means dissimilar). a useful instrument to measure domain relatedness. In general, the divergence measurements between the domains are relatively high (the values are closer to 1 than to 0). This is not surprising given the intra- and inter-domain variability of speakers and topics, the different conditions in which speech was recorded, and the WER differences across domains. However, the interesting aspect evidenced by the measurements is that MMD allows to successfully approximate such domain differences (and, likely, other more implicit diversity indicators), thus being a useful instrument to measure domain relatedness.

descriptionView Paper arrow_downwardDownload

An Integration of Random Subspace Sampling and Fishervoice for Speaker Verification

by Jinghua Zhong

2022

In this paper, we propose an integration of random subspace sampling and Fishervoice for speaker verification. In the previous random sampling framework [1], we randomly sample the JFA feature space into a set of low-dimensional... more

descriptionView Paper arrow_downwardDownload

SVM applied to the generation of biometric speech key

by PAOLA GARCIA

2022, Progress in Pattern …

Human body characteristics such as fingerprints, retinas and irises, facial struc-ture, and voice recognition are just some of the many biometric fields being researched today. These characteristics are unique to each individual, then... more

descriptionView Paper arrow_downwardDownload

Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework

by Shyamal Kumar Das Mandal

2022, International Journal of Speech Technology

The performance of speaker recognition system is highly dependent on the amount of speech used in enrollment and test. This work presents a detailed experimental review and analysis of the GMM-SVM based speaker recognition system in... more

descriptionView Paper arrow_downwardDownload

F0 Statistics for 100 Young Male Speakers of Standard Southern British English

by Gea De Jong

2021

This paper presents statistical data for the fundamental frequency of 100 young male speakers of Standard Southern British English producing spontaneous speech under cognitive stress. The material comes from the new DyViS database, for... more

descriptionView Paper arrow_downwardDownload

Effect of Noise on Lexical Tone Perception in Cantonese-Speaking Amusics

by Yike Yang

2021, Interspeech 2016

Congenital amusia is a neurogenetic disorder affecting musical pitch processing. It also affects lexical tone perception. It is well documented that noisy conditions impact speech perception in second language learners and cochlear... more

descriptionView Paper arrow_downwardDownload

Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework

by Krothapalli Sreenivasa Rao

2021, International Journal of Speech Technology

descriptionView Paper arrow_downwardDownload

Age interval and gender prediction using PARAFAC2 applied to speech utterances

by Andreas Lanitis

2021, 2016 4th International Conference on Biometrics and Forensics (IWBF)

Important problems in speech soft biometrics include the prediction of speaker's age or gender. Here, the aforementioned problems are addressed in the context of utterances collected during a long time period. A unified framework for age... more

descriptionView Paper arrow_downwardDownload

Assessing Alzheimer’s Disease from Speech Using the i-vector Approach

by Magdolna Pákáski

2021, Speech and Computer

One of the world's chronic neuro-degenerative diseases, Alzheimer's Disease (AD), leads its sufferers, among other symptoms, to suffer from speech difficulties. In particular, the inability to recall vocabulary which makes patients'... more

descriptionView Paper arrow_downwardDownload

GMM-based classifiers for the automatic detection of obstructive sleep apnea

by German Castellanos-Dominguez

2021

The aim of automatic pathological voice detection systems is to serve as tools, to medical specialists, for a more objective, less invasive and improved diagnosis of diseases. In this respect, the gold standard for those systern^ include... more

descriptionView Paper arrow_downwardDownload

Baby Cry Detection in Domestic Environment Using Deep Learning

by Yizhar Lavner

2021, SSRN Electronic Journal

Automatic detection of a baby cry in audio signals is an essential step in applications such as remote baby monitoring. It is also important for researchers, who study the relation between baby cry patterns and various health or... more

Fig. 1: An example of a baby cry signal. Top: the signal waveform. Bottom: the signal spectrogram, demonstrating the harmonic structure of the cry signal.

Fig. 2: A histogram of the 5th MFC coefficient. Red: cry events, blue: other events.

where «x is a d-dimensional feature vector and @ is a weight vector. In our case, he(x) € (0,1) predicts the likelihood of a segment to be a cry sound (values close to 1), or a different sound (values close to 0). The decision is made by comparing he(a) € (0,1) to a threshold value, to obtain a final binary classification y € {0,1}, where 1 denotes a cry event. In the training phase of the classifier, a gradient descent algorithm is used to find @ that minimizes the cost function given a dataset of n labeled samples {a), y\) ow where is a regularization parameter. The 9-minimizer found by the gradient descent algorithm is then assigned to (1) to classify new unlabeled samples.

Fig. 3: A schematic block diagram of the logistic regression algorithm

Fig. 4: An LMFB representation of a cry frame. Note that the non-uniform gaps in the frequency axis are due to the logarithmic Mel scale

Fig. 6: First convolutional layer filter weights (120 filters, each of dimensions 10 x 2).

Fig. 7: ROC curves for the logistic regression and the CNN classifiers.

descriptionView Paper arrow_downwardDownload

Multitask Learning in Connectionist Speech Recognition

by Philip Green

2021

ASR performance, for current systems, degrades dramatically when there is a mismatch between the training and testing conditions, for instance due to the presence of other sound sources (Lippmann 1997). However, many potential... more

descriptionView Paper arrow_downwardDownload

Speaker Characterization

Key research themes

1. How do voice quality variations influence the perceived personality traits and charisma of a speaker?

2. What acoustic and phonetic features capture speaker-specific variability in spontaneous and controlled speech for speaker characterization and recognition?

3. How can speaker demographic traits such as age, height, and physiognomic factors be automatically estimated from speech using i-vector frameworks and machine learning?

Related Topics

All papers in Speaker Characterization