Academia.eduAcademia.edu

Speaker Verification

description2,074 papers
group6,169 followers
lightbulbAbout this topic
Speaker verification is a biometric authentication process that uses voice characteristics to confirm an individual's identity. It involves analyzing vocal attributes, such as pitch, tone, and speech patterns, to determine if the speaker matches a pre-registered voice model, ensuring secure access to systems or information.
lightbulbAbout this topic
Speaker verification is a biometric authentication process that uses voice characteristics to confirm an individual's identity. It involves analyzing vocal attributes, such as pitch, tone, and speech patterns, to determine if the speaker matches a pre-registered voice model, ensuring secure access to systems or information.

Key research themes

1. How can speaker verification systems be robustly defended against diverse spoofing attacks including voice conversion, speech synthesis, and replay?

This research area focuses on understanding the vulnerabilities of automatic speaker verification (ASV) systems to a broad range of spoofing attacks, such as voice conversion, speech synthesis, and replay attacks, which pose severe security threats. It also investigates the design and evaluation of anti-spoofing countermeasures, including databases, protocols, and methodologies to detect and mitigate both known and unknown spoofing types, particularly in the context of text-independent ASV systems. The work is significant because spoofing can undermine the reliability of ASV systems deployed in real-world applications such as call centers, banking, and forensic investigations.

Key finding: This study introduces the first comprehensive spoofing and anti-spoofing (SAS) database comprising nine diverse spoofing techniques (including multiple speech synthesis and voice conversion systems) for text-independent... Read more
Key finding: Describes the community-driven ASVspoof initiative that addresses the lack of common datasets and standardized protocols by providing the ASVspoof 2015 dataset and organizing competitive evaluations, demonstrating the... Read more
Key finding: Provides a detailed survey of vulnerabilities unique to text-independent ASV systems, emphasizing how prior countermeasures often rely on known spoofing attacks and lack generalizability. It highlights the need for standard... Read more
Key finding: Presents a novel joint modeling approach in the i-vector subspace that simultaneously addresses speaker verification and voice conversion spoofing attack detection without relying on tailored discriminative features. By... Read more
Key finding: Provides an extensive taxonomy and comprehensive experimental comparison of spoofing countermeasures across diverse feature extraction and classification paradigms, examining their generalizability on ASVspoof2019 and VSDC... Read more

2. What techniques improve speaker verification performance and robustness under practical conditions such as limited data, language mismatch, recording channel variability, and multi-speaker environments?

This research theme focuses on enhancing speaker verification accuracy and reliability in realistic and challenging conditions. It includes methods dealing with limited-duration speech segments, channel distortions (e.g., GSM transcoded speech), multilingual and cross-lingual mismatches, and speaker overlap situations. The research addresses acoustic feature design, fusion of complementary feature sets, model adaptation, and joint optimization strategies to maintain verification performance in heterogeneous real-world scenarios.

Key finding: Demonstrates that combining vocal tract features (MFCC, LPCC) with excitation source features (LPR, LPRP) using feature- and score-level fusion significantly reduces equal error rate (EER) in i-vector based speaker... Read more
Key finding: Empirically shows that both automatic speaker recognition systems based on i-vectors/x-vectors and human listeners experience performance degradation when comparing recordings that differ in language and recording time. The... Read more
Key finding: Proposes a novel data-dependent score fusion algorithm that computes adaptive weights for fusing multiple utterance scores in GSM-transcoded speech speaker verification, using prior knowledge from enrollment scores. This... Read more
Key finding: Introduces an integrated approach combining feature-scale single-channel speech separation with back-end speaker verification, using neural network-based separation models and MFCC-T features. The proposed method trains both... Read more
by ab kh
Key finding: Finds that i-vector-based speaker identification systems outperform Gaussian mixture model (GMM) methods, especially when combined with PLDA classifiers and features like PNCC and RASTA-PLP, and that augmenting features with... Read more

3. How can speaker verification fairness across demographic and language groups be improved without requiring subgroup labels or creating reliance on balanced data samples?

This research area addresses performance disparities in speaker verification systems arising from imbalanced representation of demographic groups such as gender and nationality, or language variability. The focus is on algorithmic fairness approaches that automatically identify underperforming groups without explicit annotations, using adversarial learning, group-adapted embeddings, fusion networks, and reweighting schemes. This direction is crucial for equitable deployment of speaker verification in diverse real-world populations and for mitigating biases inherent in training data.

Key finding: Reformulates adversarial reweighting (ARW) for speaker verification with metric learning, enabling the adversarial network to assign higher weights to poorly performing instances without subgroup annotations. Demonstrates... Read more
Key finding: Proposes a modular network architecture combining group-specific embedding adaptation and score fusion to mitigate model unfairness caused by imbalanced gender representation during training. Experiments show that this... Read more
Key finding: Develops an ensemble-based deep learning framework integrating gender and ethnicity classifiers with a Siamese verification network, and demonstrates improved equal error rates and decision cost functions on the large-scale... Read more

All papers in Speaker Verification

This paper proposes text independent automatic speaker verification system using IMFCC (Inverse/ Reverse Mel Frequency Coefficients) and IT-EM (Information Theoretic Expectation Maximization). To perform speaker verification, feature... more
Emotion based speaker Identification System is the process of automatically identifying speaker’s emotion based on features extracted from speech waves. This paper presents experiment with the building and testing of a Speaker’s emotion... more
In this paper, a new method to deal with automatic speaker verification based on band-limited phaseonly correlation (BLPOC) is proposed. The aim of this study is to validate the use of the BLPOC function as a new limited-data automatic... more
The aim of this paper is to present an Arabic speech database that represents Arabic native speakers from all the cities of Saudi Arabia. The database is called the Saudi Accented Arabic Voice Bank (SAAVB). Preparing the prompt sheets,... more
In this paper, we evaluate a recently proposed spectral envelope estimation method, stabilized weighted linear prediction (SWLP), in the feature extraction stage of a large vocabulary continuous speech recognizer (LVCSR) system. Using... more
It is known that the Percentage of Identification Accuracy (PIA) of Automatic Speaker Recognition (ASR) systems is increasingly vulnerable, such as noise and channel degradation in real-time. This study presents a novel class SVM and... more
In this paper, a multimodal person verification system is presented. The system is based on face and voice modalities. Fusion of information derived from each modality is performed at the matching score level using sum rule. For face... more
remains an open problem that has not been satisfactorily solved by existing recognition techniques. In this paper, we tackle this problem using a variant of the recently proposed Probabilistic Linear Discriminant Analysis (PLDA). We show... more
ABSTRACT: Previous research has shown both that listeners' ability to detect high quality voice imitation results in judicially worrying misidentification rates (Schlichting & Sullivan, 1997) and that the semantic expectation of the... more
Spoken language identification is the process by which the language in a spoken utterance is recognized automatically. Spoken language identification is commonly used in speech translation systems, in multi-lingual speech recognition, and... more
In this paper, we describe the use of Artificial Neural Network (ANN) to compute the acoustic features in analysing forensic speaker verification. In the computation, there are two datasets derived from speech recording of a simulated... more
Speech processing has emerged as one of the important application area of digital signal processing. Various fields for research in speech processing are speech recognition, speaker recognition, speech synthesis, speech coding etc.... more
Detecting duplicate patient participation in clinical trials is a major challenge because repeated patients can undermine the credibility and accuracy of the trial's findings and result in significant health and financial risks.... more
We present a discriminative learning framework for Gaussian mixture models (GMMs) used for classification based on the extended Baum-Welch (EBW) algorithm . We suggest two criteria for discriminative optimization, namely the class... more
This paper introduces the Voices Obscured In Complex Environmental Settings (VOICES) corpus, a freely available dataset under Creative Commons BY 4.0. This dataset will promote speech and signal processing research of speech recorded by... more
In this paper, we address the problem of automatically detecting whether the audio and visual speech modalities in frontal pose videos are synchronous or not. This is of interest in a wide range of applications, for example spoof... more
In this paper, we address the problem of automatically detecting whether the audio and visual speech modalities in frontal pose videos are synchronous or not. This is of interest in a wide range of applications, for example spoof... more
Speaker recognition systems achieved significant improvements over the last decade, especially due to the performance of the i-vectors. Despite the achievements, mismatch between training and test data affects the recognition performance... more
Speaker Recognition systems exhibit a decrease in performance when the input speech is not in optimal circumstances, for example when the user is under emotional or stress conditions. The objective of this paper is measuring the effects... more
Speaker verification is crucial in biometric security systems, but performance degradation occurs under mismatched noise and recording conditions. This paper explores acoustic feature learning techniques to enhance robustness in speaker... more
Human listeners are able to understand speech in the presence of a noisy background. How to simulate this perceptual ability remains a great challenge. This paper describes a preliminary evaluation of intelligibility of the output of a... more
The ASVSpoof challenges goal is to evaluate countermeasures to spoof attacks on automatic speaker verification systems. We first analyze in more details the results of the baseline systems provided by the organization and unveil several... more
In this paper, we present the winning BUT submission for the text-dependent task of the SdSV challenge 2020. Given the large amount of training data available in this challenge, we explore successful techniques from text-independent... more
In this paper, we present the winning BUT submission for the text-dependent task of the SdSV challenge 2020. Given the large amount of training data available in this challenge, we explore successful techniques from text-independent... more
This paper presents the approach used in the BirdCLEF 2020 Competition. The objective of the competition is to try to recognize bird species through its sings and calls among 960 species in soundscapes. We use a MultiScale CNN + Triplet... more
Speaker Verification (SV) is a task to verify the claimed identity of the claimant using his/her voice sample. Though there exists an ample amount of research in SV technologies, the development concerning a multilingual conversation is... more
Automatic speaker verification (ASV) systems are vulnerable to spoofing attacks. We propose a spoofing-robust ASV system optimized directly for the recently introduced architecture-agnostic detection cost function (a-DCF), which allows... more
Research has shown that handset selectors can be used to assist telephone-based speech/speaker recognition. Most handset selectors, however, simply select the most likely handset from a set of known handsets even for speech coming from an... more
In speaker verification, a claimant may produce two or more utterances. In our previous study , we proposed to compute the optimal weights for fusing the scores of these utterances based on their score distribution and our prior knowledge... more
In telephone-based speaker verification, the channel conditions can be varied significantly from sessions to sessions. Therefore, it is desirable to estimate the channel conditions online and compensate the acoustic distortion without... more
In speaker verification, a claimant may produce two or more utterances. Typically, the scores of the speech patterns extracted from these utterances are averaged and the resulting mean score is compared with a decision threshold. Rather... more
Fusion techniques have been widely used in multi-modal biometric authentication systems. While these techniques are mainly applied to combine the outputs of modality-dependent classifiers, they can also be applied to fuse the decisions or... more
In many biometric systems, the scores of multiple samples (e.g. utterances) are averaged and the average score is compared against a decision threshold for decision making. The average score, however, may not be optimal because the... more
This paper proposes an articulatory feature-based conditional pronunciation modeling (AFCPM) technique for speaker verification. The technique models the pronunciation behaviors of speakers by creating a link between the actual phones... more
Acoustic mismatch between the training and recognition conditions presents one of the serious challenges faced by speaker recognition researchers today. The goal of channel compensation is to achieve performance approaching that of a... more
This paper proposes a speaker verification system based on articulatory feature-based conditional pronunciation modeling (AFCPM). The system captures the pronunciation characteristics of speakers by modeling the linkage between the actual... more
This paper presents an approach that uses articulatory features (AFs) derived from spectral features for telephone-based speaker verification. To minimize the acoustic mismatch caused by different handsets, handset-specific normalization... more
Because of the differences in education background, accents, etc., different persons have their unique way of pronunciation. This paper exploits the pronunciation characteristics of speakers and proposes a new conditional pronunciation... more
To improve the reliability of telephone-based speaker verification systems, channel compensation is indispensable. However, it is also important to ensure that the channel compensation algorithms in these systems surpress channel... more
The ETSI has recently published a front-end processing standard for distributed speech recognition systems. The key idea of the standard is to extract the spectral features of speech signals at the front-end terminals so that acoustic... more
With the rise of crimes in Automated Teller Machines, the security of the ATM is at stake. The Traditional Security Methods such as passwords or pins had always been a cause of worry to the users because of it getting lost, stolen or... more
High-level characteristics such as word usage, pronunciation, phonotactics, prosody, etc., have seen a resurgence for automatic speaker recognition over the last several years. With the availability of many conversation sides per speaker... more
This paper introduces a modeling flow for predicting waveforms as a function of parameters, variables in the system generating the waveforms. In order to achieve this goal, a neural network is involved. The model is developed using... more
This thesis attempts to solve the problem of authorship verification. Authorship verification is a subdomain of authorship analysis and its origins lie in stylometry analysis. However most of the research in authorship analysis is based... more
Usually the mel-frequency cepstral coefficients are estimated either from a periodogram or from a windowed periodogram. We state a general estimator which also includes multitaper estimators. We propose approximations of the variance and... more
Speaker verification is the process of accepting or rejecting claimed identity in terms of its sound features. A speaker verification system can be used for numerous security systems, including bank account accessing, getting to security... more
This paper addresses the problem of speaker verification in two speaker conversations, proposing a set of confidence measures to assess the quality of a given speaker segmentation. In addition we study how these measures can be used to... more
Audio deepfakes, a subset of deepfake technology, employ machine learning or deep learning to create deceptive audio content by synthesizing authentic recordings. Such deepfakes not only fosters the dissemination of misinformation but... more
In the last years, the i-vector approach became the state-of-theart in speaker recognition systems. As in previous approaches, i-vector -based systems suffer greatly in presence of additive noise, especially in low SNR cases. In this... more
Les progrès de performance en vérification du locuteur ces quinze dernières années sont incontestables. Les systèmes sont de plus en plus sûrs dans le sens où les taux EER ou DCF diminuent d'année en année. Pourtant, il est nécessaire de... more
Download research papers for free!