Academia.eduAcademia.edu

Speaker Verification

description2,075 papers
group6,176 followers
lightbulbAbout this topic
Speaker verification is a biometric authentication process that uses voice characteristics to confirm an individual's identity. It involves analyzing vocal attributes, such as pitch, tone, and speech patterns, to determine if the speaker matches a pre-registered voice model, ensuring secure access to systems or information.
lightbulbAbout this topic
Speaker verification is a biometric authentication process that uses voice characteristics to confirm an individual's identity. It involves analyzing vocal attributes, such as pitch, tone, and speech patterns, to determine if the speaker matches a pre-registered voice model, ensuring secure access to systems or information.

Key research themes

1. How can speaker verification systems be robustly defended against diverse spoofing attacks including voice conversion, speech synthesis, and replay?

This research area focuses on understanding the vulnerabilities of automatic speaker verification (ASV) systems to a broad range of spoofing attacks, such as voice conversion, speech synthesis, and replay attacks, which pose severe security threats. It also investigates the design and evaluation of anti-spoofing countermeasures, including databases, protocols, and methodologies to detect and mitigate both known and unknown spoofing types, particularly in the context of text-independent ASV systems. The work is significant because spoofing can undermine the reliability of ASV systems deployed in real-world applications such as call centers, banking, and forensic investigations.

Key finding: This study introduces the first comprehensive spoofing and anti-spoofing (SAS) database comprising nine diverse spoofing techniques (including multiple speech synthesis and voice conversion systems) for text-independent... Read more
Key finding: Describes the community-driven ASVspoof initiative that addresses the lack of common datasets and standardized protocols by providing the ASVspoof 2015 dataset and organizing competitive evaluations, demonstrating the... Read more
Key finding: Provides a detailed survey of vulnerabilities unique to text-independent ASV systems, emphasizing how prior countermeasures often rely on known spoofing attacks and lack generalizability. It highlights the need for standard... Read more
Key finding: Presents a novel joint modeling approach in the i-vector subspace that simultaneously addresses speaker verification and voice conversion spoofing attack detection without relying on tailored discriminative features. By... Read more
Key finding: Provides an extensive taxonomy and comprehensive experimental comparison of spoofing countermeasures across diverse feature extraction and classification paradigms, examining their generalizability on ASVspoof2019 and VSDC... Read more

2. What techniques improve speaker verification performance and robustness under practical conditions such as limited data, language mismatch, recording channel variability, and multi-speaker environments?

This research theme focuses on enhancing speaker verification accuracy and reliability in realistic and challenging conditions. It includes methods dealing with limited-duration speech segments, channel distortions (e.g., GSM transcoded speech), multilingual and cross-lingual mismatches, and speaker overlap situations. The research addresses acoustic feature design, fusion of complementary feature sets, model adaptation, and joint optimization strategies to maintain verification performance in heterogeneous real-world scenarios.

Key finding: Demonstrates that combining vocal tract features (MFCC, LPCC) with excitation source features (LPR, LPRP) using feature- and score-level fusion significantly reduces equal error rate (EER) in i-vector based speaker... Read more
Key finding: Empirically shows that both automatic speaker recognition systems based on i-vectors/x-vectors and human listeners experience performance degradation when comparing recordings that differ in language and recording time. The... Read more
Key finding: Proposes a novel data-dependent score fusion algorithm that computes adaptive weights for fusing multiple utterance scores in GSM-transcoded speech speaker verification, using prior knowledge from enrollment scores. This... Read more
Key finding: Introduces an integrated approach combining feature-scale single-channel speech separation with back-end speaker verification, using neural network-based separation models and MFCC-T features. The proposed method trains both... Read more
by ab kh
Key finding: Finds that i-vector-based speaker identification systems outperform Gaussian mixture model (GMM) methods, especially when combined with PLDA classifiers and features like PNCC and RASTA-PLP, and that augmenting features with... Read more

3. How can speaker verification fairness across demographic and language groups be improved without requiring subgroup labels or creating reliance on balanced data samples?

This research area addresses performance disparities in speaker verification systems arising from imbalanced representation of demographic groups such as gender and nationality, or language variability. The focus is on algorithmic fairness approaches that automatically identify underperforming groups without explicit annotations, using adversarial learning, group-adapted embeddings, fusion networks, and reweighting schemes. This direction is crucial for equitable deployment of speaker verification in diverse real-world populations and for mitigating biases inherent in training data.

Key finding: Reformulates adversarial reweighting (ARW) for speaker verification with metric learning, enabling the adversarial network to assign higher weights to poorly performing instances without subgroup annotations. Demonstrates... Read more
Key finding: Proposes a modular network architecture combining group-specific embedding adaptation and score fusion to mitigate model unfairness caused by imbalanced gender representation during training. Experiments show that this... Read more
Key finding: Develops an ensemble-based deep learning framework integrating gender and ethnicity classifiers with a Siamese verification network, and demonstrates improved equal error rates and decision cost functions on the large-scale... Read more

All papers in Speaker Verification

Recently satisfactory results have been obtained in NIST speaker recognition evaluations. These results are mainly due to accurate modeling of a very large development dataset provided by LDC. However, for many realistic scenarios the use... more
— An automatic verification of person's identity from its voice is a part of modern telecommunication services. In order to execute a verification task, a speech signal has to be transmitted to a remote server. So, a performance of the... more
One characteristic that distinguishes speaker recognition (identification, verification, classification, tracking, etc.) from other biometrics is that it is designed to operate with devices and over channels that were created for other... more
This article deals with a technique of voice forgery using the ALISP (Automatic Language Independent Speech Processing) approach. Such a technique allows to transform the voice of an arbitrary person (the impostor), forging the identity... more
Device, language and environmental mismatch adversely affect speaker verification (SV) performance. We investigate such effects empirically based on the M3 (multibiometric, multilingual and multi-device) Corpus [1]. Device mismatch (among... more
Authentication is the process whereby a user proves his claim to identity. This paper aims to review existing MMBAS and multimodal biometric datasets commonly used for testing and benchmarking results. At the same time, a brief overview... more
The main objectives of this work are to describe the online bus pass generation and ticket booking using QR code. Online bus pass generation is helpful to people who are suffering issues with the present technique for the generation of... more
After the success of NOLISP'03, NOLISP'04 summer school and NOLISP'05, we are pleased to present NOLISP'07. The fourth event in a series of events related to Non-linear speech processing.
Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is a technique for generating speech from trained statistical models where spectrum, pitch and durations of basic speech units are modelled altogether. The aim of this work is... more
The QUT-NOISE-SRE protocol is designed to mix the large QUT-NOISE database, consisting of over 10 hours of background noise, collected across 10 unique locations covering 5 common noise scenarios, with commonly used speaker recognition... more
This paper presents new a feature transformation technique applied to improve the screening accuracy for the automatic detection of pathological voices. The statistical transformation is based on Hidden Markov Models, obtaining a... more
In this paper we propose a new feature extraction algorithm based on nonlinear prediction: the Neural Predictive Coding model which is an extension of the classical LPC one. This model is applied to speaker verification by the... more
Learning good representations is of crucial importance in deep learning. Mutual Information (MI) or similar measures of statistical dependence are promising tools for learning these representations in an unsupervised way. Even though the... more
In this paper, automatic speaker recognition system is implemented by combining feature extraction and feature matching technique. Feature extraction method that is implemented by the Mel Frequency Cepstral Coefficients (MFCC). The Vector... more
This paper proposes a joint verification-localization structure based on split-band analysis of speech signal and the mixed voicing level. To address the problems in reverberant acoustic environments, a new fundamental frequency... more
Clustering is needed in various applications such as biometric person authentication, speech coding and recognition, image compression and information retrieval. Hundreds of clustering methods have been proposed for the task in various... more
Because of the differences in education background, accents, etc., different persons have their unique way of pronunciation. This paper exploits the pronunciation characteristics of speakers and proposes a new conditional pronunciation... more
Nowadays state-of-the-art speaker recognition systems obtain quite accurate results for both text-independent and text-dependent tasks as long as they are trained on a fair amount of development data from the target domain (assuming clean... more
This paper investigates the effects of limited speech data in the context of speaker verification using a probabilistic linear dis-criminant analysis (PLDA) approach. Being able to reduce the length of required speech data is important to... more
This paper proposes the addition of a weighted median Fisher discriminator (WMFD) projection prior to length-normalised Gaussian probabilistic linear discriminant analysis (GPLDA) modelling in order to compensate the additional session... more
Speaker verification might be considered a binary classification problem in that the objective is to determine whether or not an utterance is from the individual whose identity is claimed. Several factors make speaker verification... more
This paper introduces Locally Recurrent Probabilistic Neural Networks (LRPNN) as an extension of the well-known Probabilistic Neural Networks (PNN). A LRPNN, in contrast to a PNN, is sensitive to the context in which events occur, and... more
Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves. Speaker recognition is basically divided into two-classification: speaker verification and... more
This paper investigates advanced channel compensation techniques for the purpose of improving i-vector speaker verification performance in the presence of high intersession variability using the NIST 2008 and 2010 SRE corpora. The... more
Substantial progress has been achieved in voice-based biometrics in recent times but a variety of challenges still remain for speech research community. One such obstacle is reliable speaker authentication from speech signals degraded by... more
In the past few years, discriminative approaches to perform speaker detection have shown good results and an increasing interest. Among these methods, SVM based systems have lots of advantages, especially their ability to deal with a high... more
The Internet provides a convenient platform for cyber criminals to anonymously conduct their illegitimate activities, such as phishing and spamming. As a result, in recent years, authorship analysis of anonymous e-mails has received some... more
The automatic identification of person’s identity from their voice is a part of modern telecommunication services. In order to execute the identification task, speech signal has to be transmitted to a remote server. So a performance of... more
by Kong Lee and 
1 more
Probabilistic linear discriminant analysis (PLDA) has shown to be effective for modeling channel variability in the i-vector space for text-independent speaker verification. Speaker verification is a binary hypothesis testing. Given a... more
Authentication System (BAS) based on the fusion of two user-friendly biometric modalities: signature and speech. All biometric data used in this work were extracted from the BIOMET multimodal database . The Signature
A method is described for designing speaker recognition features that are robust to telephone handset distortion. The approach transforms features such as mel-cepstral features, log spectrum, and prosody-based features with a non-linear... more
This paper presents the QUT speaker recognition system, as a competing system in the Speakers In The Wild (SITW) speaker recognition challenge. Our proposed system achieved an overall ranking of second place, in the main core-core... more
The SRI speaker recognition system for the 2008 NIST speaker recognition evaluation (SRE) incorporates a variety of models and features, both cepstral and stylistic. We highlight the improvements made to specific subsystems and analyze... more
Focused on the issue that the robustness of traditional Mel Frequency Cepstral Coefficient (MFCC) feature degrades drastically in speaker verification in noisy environments, a kind of suitable extraction method for low SNR environments... more
In the last few years, the use of i-vectors along with a generative back-end has become the new standard in speaker recognition. An i-vector is a compact representation of a speaker utterance extracted from a low dimensional total... more
In this paper we describe the major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around... more
In speech and audio applications, short-term signal spectrum is often represented using mel-frequency cepstral coefficients (MFCCs) computed from a windowed discrete Fourier transform (DFT). Windowing reduces spectral leakage but variance... more
Recently we have investigated the use of state-of-the-art text-dependent speaker verification algorithms for user authentication and obtained satisfactory results mainly by using a fair amount of text-dependent development data from the... more
Speaker verification is a challenging problem in speaker recognition where the objective is to determine whether a segment of speech in fact comes from a specific individual. In supervised machine learning terms this is a challenging... more
In this paper we describe a system we have developed for automatic broadcast-quality video indexing that successfully combines results from the fields of speaker verification, acoustic analysis, very large vocabulary speech recognition,... more
The performance of speaker verification (SV) systems degrades rapidly in noise rendering them unsuitable for security-critical applications in mobile phones, where false acceptance rates (FAR) of ∼ 10 −4 are required. However, less... more
This paper describes a GMM-based speaker verification system that uses speaker-dependent background models transformed by speaker-specific maximum likelihood linear transforms to achieve a sharper separation between the target and the... more
This paper investigates the use of the dimensionality-reduction techniques weighted linear discriminant analysis (WLDA), and weighted median fisher discriminant analysis (WMFD), before probabilistic linear discriminant analysis (PLDA)... more
Support vector machines (SVMs), and kernel classifiers in general, rely on the kernel functions to measure the pairwise similarity between inputs. This paper advocates the use of discrete representation of speech signals in terms of the... more
Download research papers for free!