Source Modeling Research Papers

Application of Multi-way EEG Decomposition for Cognitive Workload Monitoring

2025

This paper describes the use of multi-way decomposition methods to efficiently summarize electroencephalographic (EEG) data. A space-frequency-time atomic decomposition was applied to EEG data recorded while subjects performed tasks... more

descriptionView Paper arrow_downwardDownload

Characterization of the Earth's Surface State by Unsupervised Classification: Case of Vegetated, Aquatic and Mineral Surfaces

by Sié Anicet Ouattara

2025, American Journal of Applied Sciences

In this study, we propose an unsupervised classification scheme based on the Dempster-Shafer Theory (TDS) and the Dezert-Smarandache Theory (DSmT) to characterize vegetated, aquatic and mineral surfaces. From pre-processed ASTER satellite... more

In this study, we propose an unsupervised classification scheme based on the Dempster-Shafer Theory (TDS) and the Dezert-Smarandache Theory (DSmT) to characterize vegetated, aquatic and mineral surfaces. From pre-processed ASTER satellite images (georeferencing, geometric correction and 15 m re-sampling), neo-channels were produced by determining the spectral indices NDVI, MNDWI and NDBaI, considered as sources of information for classification of a given pixel. NDVI is a contrast function to highlight vegetation. By account, the MNDWI makes it possible to characterize the water and the NDBaI makes it possible to recognize the mineral resources. Then, we modeled respectively the formalisms of the DST and the DSmT, these formalisms are modeling tools close to advanced probabilities based on the notions of belief and fusion functions to take into account certain imperfections (uncertainty, ignorance, etc.) encountered in the acquisition of images. In addition, the DST manages a formalism of disjunction between the sources during the DSmT simultaneously manages a disjunction and a conjunction between the sources. Next we realized the algorithms and related codes that we implemented in the MATLAB environment. Our contribution lies in taking into account the imperfections (inaccuracies and uncertainties) linked to source information through the use of mass functions based on a simple Gaussian distribution support model in order to model each focal element independently of the others and to evaluate the belonging of a pixel to a class with respect to the majority of elements representing said class. The resulting results show that the DST approach is relatively satisfactory for the unsupervised classification of mineral surfaces and aquatic surfaces while it is not satisfactory for vegetated surfaces according to all proposed models. As for the DSmT, it presents satisfactory results for all the models proposed. The model with the exclusion integrity constraint E∩V ∩ M = φ was selected as the best model because having, in addition to an average rate of well-graded pixels of 93.34%, a compliance rate (96, 37%) with the terrain higher than those of the other models implemented.

descriptionView Paper arrow_downwardDownload

Combining source and system information for limited data speaker verification

by Ramakrishnan Angarai Ganesan

2025, Interspeech 2014

Speaker verification using limited data is always a challenge for practical implementation as an application. An analysis on speaker verification studies for an i-vector based method using Mel-Frequency Cepstral Coefficient (MFCC) feature... more

descriptionView Paper arrow_downwardDownload

Complex faulting in the Quetta Syntaxis: fault source modeling of the October 28, 2008 earthquake sequence in Baluchistan, Pakistan, based on ALOS/PALSAR InSAR data

by muhammad gaddafi Usman

2024, Earth, Planets and Space

The Quetta Syntaxis in western Baluchistan, Pakistan, is the result of an oroclinal bend of the western mountain belt and serves as a junction for different faults. As this area also lies close to the left-lateral strike-slip Chaman... more

descriptionView Paper arrow_downwardDownload

First Experiments on Speaker Identification Combining a New Shift-invariant Phase-related Feature (NRD), MFCCs and F0 Information

by Aníbal Ferreira

2024, Proceedings of the 15th International Joint Conference on e-Business and Telecommunications

In this paper we report on a number of speaker identification experiments that assume a phonetic-oriented segmentation scheme exists such as to motivate the extraction of psychoacoustically-motivated phase and pitch related features. MFCC... more

descriptionView Paper arrow_downwardDownload

Heterogeneous Behavior of the Campotosto Normal Fault (Central Italy) Imaged by InSAR GPS and Strong-Motion Data: Insights from the 18 January 2017 Events

by Laura Scognamiglio

2024, Remote Sensing

On 18 January 2017, the 2016–2017 central Italy seismic sequence reached the Campotosto area with four events with magnitude larger than 5 in three hours (major event MW 5.5). To study the slip behavior on the causative fault/faults we... more

descriptionView Paper arrow_downwardDownload

Bollettino Sismico Italiano: Analisys of Early Aftershocks of the 2016 MW 6.0 Amatrice, MW 5.9 Visso and MW 6.5 Norcia earthquakes in Central Italy

by Alfonso Giovanni Mandiello

2024

BOLLETTINO SISMICO ITALIANO: ANALISYS OF EARLY AFTERSHOCKS OF THE 2016 MW 6.0 AMATRICE, MW 5.9 VISSO AND MW 6.5 NORCIA EARTHQUAKES IN CENTRAL ITALY B. Castello e Gruppo di Lavoro Bollettino Sismico Italiano (A. Nardi, A. Marchetti, F.M.... more

descriptionView Paper arrow_downwardDownload

Optimization of Excitation in FDTD Method and Corresponding Source Modeling

by Bojan Dimitrijevic

2023, Radioengineering

Source and excitation modeling in FDTD formulation has a significant impact on the method performance and the required simulation time. Since the abrupt source introduction yields intensive numerical variations in whole computational... more

descriptionView Paper arrow_downwardDownload

Application of the Dempster-Shafer Theory to the Classification of Pixels from Aster Satellite Images and Spectral Indices

by Wognin Joseph VANGAH

2023, Journal of Applied Mathematics and Physics

In this paper, it is proposed to apply the Dempster-Shafer Theory (DST) or the theory of evidence to map vegetation, aquatic and mineral surfaces with a view to detecting potential areas of observation of outcrops of geological formations... more

descriptionView Paper arrow_downwardDownload

Intra-frame variability as a predictor of frame classifiability

by Torbjørn Svendsen

2023, Interspeech 2010

This paper examines the association between the variability of the speech signal inside an analysis frame and the relative difficulty of classifying that frame. We introduce a novel measure of speech frame variability and show through... more

descriptionView Paper arrow_downwardDownload

Glottal waveforms for speaker inference & a regression score post-processing method applicable to general classification problems

by Girija Chetty

2023

I wish to thank my primary supervisor Prof. Michael Wagner for his introducing me to speech as a biometric, and for his support, suggestions and guidance throughout the learning process that has been my doctoral studies. Thank you also to... more

descriptionView Paper arrow_downwardDownload

Application of the Dempster-Shafer Theory to the Classification of Pixels from Aster Satellite Images and Spectral Indices

by Adama Koné

2023, Journal of Applied Mathematics and Physics

In this paper, it is proposed to apply the Dempster-Shafer Theory (DST) or the theory of evidence to map vegetation, aquatic and mineral surfaces with a view to detecting potential areas of observation of outcrops of geological formations... more

descriptionView Paper arrow_downwardDownload

Semi-automatically Mapping Structured Sources into the Semantic Web

by Shubham Gupta

2023, Lecture Notes in Computer Science

Linked data continues to grow at a rapid rate, but a limitation of a lot of the data that is being published is the lack of a semantic description. There are tools, such as D2R, that allow a user to quickly convert a database into RDF,... more

descriptionView Paper arrow_downwardDownload

A Comparison of Cepstral Features in the Detection of Pathological Voices by Varying the Input and Filterbank of the Cepstrum Computation

by M Kiran Reddy

2023, IEEE Access

Automatic voice pathology detection enables objective assessment of pathologies that affect the voice production mechanism. Detection systems have been developed using the traditional pipeline approach (consisting of the feature... more

Automatic voice pathology detection enables objective assessment of pathologies that affect the voice production mechanism. Detection systems have been developed using the traditional pipeline approach (consisting of the feature extraction part and the detection part) and using the modern deep learning-based end-to-end approach. Due to the lack of vast amounts of training data in the study area of pathological voice, the former approach is still a valid choice. In the existing detection systems based on the traditional pipeline approach, the mel-frequency cepstral coefficient (MFCC) features can be regarded as the defacto standard feature set. In this study, automatic voice pathology detection is investigated by comparing the performance of various MFCC variants derived by considering two factors: the input and the filterbank in the cepstrum computation. For the first factor, three inputs (the voice signal, the glottal source and the vocal tract) are compared. The glottal source and the vocal tract are estimated using the quasi-closed phase glottal inverse filtering method. For the second factor, the mel-frequency and linear-frequency filterbanks are compared. Experiments were conducted separately using six databases consisting of voices produced by speakers suffering from one of four disorders (dysphonia, Parkinson's disease, laryngitis, or heart failure) and by healthy speakers. Support vector machine (SVM) was used as the classifier. The results show that by combining mel-and linear-frequency cepstral coefficients derived from the glottal source and vocal tract, better overall detection accuracy was obtained compared to the defacto MFCC features derived from the voice signal. Furthermore, this combination provided comparable or better performance than four existing cepstral feature extraction techniques in clean and high signal-to-noise ratio (SNR) conditions. INDEX TERMS Voice disorders, glottal inverse filtering, support vector machine, cepstral coefficients. I. INTRODUCTION Voice pathologies arise either due to physical changes in the voice production mechanism (e.g., in the respiratory system, vocal folds, and vocal tract) [1], [2] or due to improper vocal use when the physical structure of the mechanism is normal (e.g., vocal fatigue or ventricular phonation) [3]-[5]. Examples of voice pathologies are dysarthria [7], dysphonia [8], vocal polyp [9], and developmental dysphasia [13]. Voice pathology may also indicate early neurodegenerative disease such as Parkinson's disease (PD) [10]-[12], [14]. Voice pathology detection refers to a technology to automatically The associate editor coordinating the review of this manuscript and approving it for publication was Shuihua Wang. distinguish normal voices from pathological voices by computer using the recorded voice signal. Existing voice pathology detection systems can be divided into two categories: traditional pipeline systems and modern end-to-end systems [15]. The traditional pipeline system consists of two components [15], [16]: the feature extraction part and the detection part. The feature extraction part tries to capture discriminative information from acoustic voice signal waveforms by representing this information in compressed forms using a set of pre-defined features. The feature sets reported in the literature for voice pathology detection can be grouped into four categories: (1) perturbation measures (such as jitter and shimmer); (2) spectral and cepstral measures

descriptionView Paper arrow_downwardDownload

Provisional chapter Speaker Recognition : Advancements and Challenges

by Homayoon Beigi

2023

Speaker Recognition is a multi-disciplinary branch of biometrics that may be used for identification, verification, and classification of individual speakers, with the capability of tracking, detection, and segmentation by extension.... more

descriptionView Paper arrow_downwardDownload

Application of the Dempster-Shafer Theory to the Classification of Pixels from Aster Satellite Images and Spectral Indices

by Adles Francis Kouassi

2023, Journal of Applied Mathematics and Physics

In this paper, it is proposed to apply the Dempster-Shafer Theory (DST) or the theory of evidence to map vegetation, aquatic and mineral surfaces with a view to detecting potential areas of observation of outcrops of geological formations... more

descriptionView Paper arrow_downwardDownload

Spoken Keyword Retrieval Using Source and System Features

by Hemant Patil

2023, Lecture Notes in Computer Science

In this paper, a novel excitation source-related feature set, viz., Teager Energy-based Mel Frequency Cepstral Coefficients (T-MFCC) is proposed for the task of spoken keyword detection. Experiments are carried out on TIMIT database for... more

descriptionView Paper arrow_downwardDownload

Mobile phone identification using recorded speech signals

by Constantine Kotropoulos

2023, 2014 19th International Conference on Digital Signal Processing

In this paper, we elaborate on mobile phone identification from recorded speech signals. The goal is to extract intrinsic traces related to the mobile phone used to record a speech signal. Mel frequency cepstral coefficients (MFCCs) are... more

descriptionView Paper arrow_downwardDownload

Speaker recognition via fusion of subglottal features and MFCCs

by Abeer Alwan

2023, Interspeech 2014

Motivated by the speaker-specificity and stationarity of subglottal acoustics, this paper investigates the utility of subglottal cepstral coefficients (SGCCs) for speaker identification (SID) and verification (SV). SGCCs can be computed... more

Figure 1: Vowel spectrograms comparing the within-speaker variability of speech (top panel) and subglottal acoustics (bot- tom panel). Data are sampled from the recordings of a female speaker in the WashU-UCLA corpus.

Figure 2: Block diagram of the proposed SID/SV framework. The arrows in black correspond to training (enrollment) and the arrows in red correspond to evaluation. Subscripts / and S denote MFCCs and SGCCs, respectively. The As denote speaker models and the fs denote acoustic model scores (for test data).

Figure 3: (a) Means (circles) and standard deviations (error bars) of the segment-level correlations (segment = vowel token) between actual and estimated SGCCs. Results from all 50 speakers in the WashU-UCLA corpus are pooled together. (b) Distribution of speaker- level correlation (i.e., average segment-level correlation on a per-speaker basis) for three different cepstral coefficients (y1, y14, y22).

Figure 4: Percent identification error (J-) as a function of SGCC weight (0 weight = MFCCs only) for the TIMIT database.

Figure 5: Detection error tradeoff (DET) curves corresponding to different SGCC weights (0 weight = MFCCs only) for the 5 second test trials in the NIST 2008 database.

Table 1: J-ratio, a measure of class separation (class = speaker), for different features (+ denotes concatenation). Features were extracted from isolated vowel recordings of speech and subglot- tal acoustics, for all 50 speakers in the WashU-UCLA corpus.

Table 2: Percent identification errors for the TIMIT database in three different conditions, for the baseline (MFCC-only) and the best combined systems (relative reductions in paranthesis).

descriptionView Paper arrow_downwardDownload

Application of the Dempster-Shafer Theory to the Classification of Pixels from Aster Satellite Images and Spectral Indices

by WOGNIN VANGAH

2023, Journal of Applied Mathematics and Physics

In this paper, it is proposed to apply the Dempster-Shafer Theory (DST) or the theory of evidence to map vegetation, aquatic and mineral surfaces with a view to detecting potential areas of observation of outcrops of geological formations... more

descriptionView Paper arrow_downwardDownload

A Graph-Based Approach to Learn Semantic Descriptions of Data Sources

by Ambite JoséLuis

2023, Advanced Information Systems Engineering

Semantic models of data sources and services provide support to automate many tasks such as source discovery, data integration, and service composition, but writing these semantic descriptions by hand is a tedious and time-consuming task.... more

descriptionView Paper arrow_downwardDownload

Heterogeneous Behavior of the Campotosto Normal Fault (Central Italy) Imaged by InSAR GPS and Strong-Motion Data: Insights from the 18 January 2017 Events

by Piera Gambino

2022, Remote Sensing

On 18 January 2017, the 2016–2017 central Italy seismic sequence reached the Campotosto area with four events with magnitude larger than 5 in three hours (major event MW 5.5). To study the slip behavior on the causative fault/faults we... more

descriptionView Paper arrow_downwardDownload

Speaker verification based on fusion of acoustic and articulatory information

by Jangwon Kim

2022, Interspeech 2013

We propose a practical, feature-level fusion approach for combining acoustic and articulatory information in speaker verification task. We find that concatenating articulation features obtained from the measured speech production data... more

descriptionView Paper arrow_downwardDownload

A new speaker identification algorithm for gaming scenarios

by Hoang Do

2022, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Speaker identification is a well-established research problem but has not been a major application used in gaming scenarios. In this paper, we propose a new algorithm for the open-set, text-independent, speaker ID problem, applied as an... more

descriptionView Paper arrow_downwardDownload

Spoken Keyword Retrieval Using Source and System Features

by Nikhil Bhendawade

2022, Lecture Notes in Computer Science

In this paper, a novel excitation source-related feature set, viz., Teager Energy-based Mel Frequency Cepstral Coefficients (T-MFCC) is proposed for the task of spoken keyword detection. Experiments are carried out on TIMIT database for... more

descriptionView Paper arrow_downwardDownload

Application of the Dempster-Shafer Theory to the Classification of Pixels from Aster Satellite Images and Spectral Indices

by Wognin Joseph VANGAH

2022, Journal of Applied Mathematics and Physics

In this paper, it is proposed to apply the Dempster-Shafer Theory (DST) or the theory of evidence to map vegetation, aquatic and mineral surfaces with a view to detecting potential areas of observation of outcrops of geological formations... more

descriptionView Paper arrow_downwardDownload

Application of the Dempster-Shafer Theory to the Classification of Pixels from Aster Satellite Images and Spectral Indices

by Wognin Joseph VANGAH

2022, Journal of Applied Mathematics and Physics

In this paper, it is proposed to apply the Dempster-Shafer Theory (DST) or the theory of evidence to map vegetation, aquatic and mineral surfaces with a view to detecting potential areas of observation of outcrops of geological formations... more

descriptionView Paper arrow_downwardDownload

Performance Assessment of A Diffraction Field Computation Method Based on Source Model

by Gokhan B Esmer

2022, 2008 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video

Efficient computation of scalar optical diffraction field due to an object is an essential issue in holographic 3D television systems. The first step in the computation process is to construct an object. As a solution for this step, we... more

descriptionView Paper arrow_downwardDownload

Speech Recognition using MFCC

by Siwat Suksri

2022

This paper describes an approach of speech recognition by using the Mel-Scale Frequency Cepstral Coefficients (MFCC) extracted from speech signal of spoken words. Principal Component Analysis is employed as the supplement in feature... more

Database consists of two groups of speech samples recorded in an environmentally controlled recording room to have al possibly less acoustical interferes to the quality of sound sample during the recording time. The first group comprises o thirty spoken sound samples of a word “MFCC” and another is a group of thirty sound samples of a word “PCA”. All sound signals are recorded under most similar setting condition such as the same length of recording time, and the level of sound amplitude. The sampling frequency is originally set at 44.1 KHz for making all sound records in order to preserve acoustical quality of sound signals. Prior to detect for voiced segments in speech sounds, signals are digitized offline via a 16-bit A/D converter. Thereafter, signals are monitored and edited for all possible sound artifacts that could affect in further processing phases. Furthermore, the longer silences than a half second are manually removed as well in the Goldwave sound editor program.

Fig.2 Workflow for the MFCC based speech classification. At which uv is the unvoiced segment of the n segment with energy at scale 6, maximized.

in which f,,¢; is the perceived frequency and fj;, is the real linear frequency in speech signal. In filtering phase, a series of the 16 triangular band-pass filters,N;=16 is used for a filter bank whose center frequencies and bandwidths are selected according to the mel-scale. They span the entire signal bandwidth for [0 —£]. The center frequency of individual filter is defined; The mel-scale used in this work is to map between inear frequency scale of speech signal to logarithmic scale for frequencies higher than 1 kHz. This makes he spectral frequency characteristics of signal closely sorresponding to the human auditory perception [5]. The nel-scale frequency mapping is formulated as:

Fig. 5 Comparative training results Results of the sixteen-order MFCC extracted from database of spoken words are shown in Figure (4). The significant difference in quantity can be clearly identified between sample classes of different words. The comparative performances obtained from several trials on sample selections in training and testing states are graphically plotted in box-and-whisker diagrams for convenient examination on statistical descriptive. Training results of classification shown in Figure (5), in case of SVM classifier, provide much compact distribution with consistent training scores as compared to ML classification. In addition more decreasing change in maximum and minimum adjacent values depicted as top and bottom bars of individual box plots can be notified as well for SVM. These suggest that the SVM classifier seems to give more consistent and reliable performance on training sample state than ML does. The testing results shown in Figure (6) consistently reveal the similar tendency of improving recognition on larger size of samples used in testing state. The distributions of SVM scores seem more tense and consistent than those of ML for all percentages of dataset tested for recognitions.

This paper addressed the principle of speech MFCC extraction for performing word recognition. Details in technique are described and its efficiency performance on training scores agree with improvement in recognition rates when training words with support vector machine.

descriptionView Paper arrow_downwardDownload

Combining pitch and MFCC for speaker recognition systems

by Hassan Ezzaidi

2022

Usually, speaker recognition systems do not take into account the short-term dependence between the vocal source and the vocal tract. A feasibility study that retains this dependence is presented here. A model of joint probability... more

descriptionView Paper arrow_downwardDownload

A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique

by Prateek Srivastava

2022

Automatic speaker recognition system is used to recognize an unknown speaker among several reference speakers by making use of speaker-specific information from their speech. In this paper, we introduce a novel, hierarchical,... more

descriptionView Paper arrow_downwardDownload

Speaker verification in sensor and acoustic environment mismatch conditions

by Rohit Sinha

2022, International Journal of Speech Technology

Our initial speaker verification study exploring the impact of mismatch in training and test conditions finds that the mismatch in sensor and acoustic environment results in significant performance degradation compared to other mismatches... more

descriptionView Paper arrow_downwardDownload

Techniques for Crosslingual Voice Conversion

by Anderson Fraiha Machado

2022

Abstract—The crosslingual voice conversion problem refers to the replacement of a speaker’s timbre or vocal identity in a recorded sentence, assuming that the source speaker and target speaker use different languages. This problem differs... more

descriptionView Paper arrow_downwardDownload

System Identification Algorithms Applied to Glottal Model Fitting

by Irene Murtagh

2022, International Joint Conference on Biomedical Engineering Systems and Technologies

This study proposes a new method of fitting a glottal model to the glottal flow estimate using system identification (SI) algorithms. Each period of the glottal estimate is split into open and closed phases and each phase is modelled as... more

descriptionView Paper arrow_downwardDownload

Extraction and Utilization of Excitation Information of Speech: A Review

by Sudarsana Kadiri

2022, Proceedings of the IEEE

Fig. 1. Demonstration of the production of voiced speech in two phonation types: normal (left) and breathy (right). The upper part of the figure shows three time-domain waveforms (the speech pressure signal, the glottal flow estimated by GIF, and the glottal area function) anda the lower part shows images of the vocal folds. The gray vertical lines show the instants when the images of the vocal folds were taken by the transoral high-speed digital videoendoscopy system (adopted from [38]).

Fig. 2. Schematic presentation of the human speech production mechanism (adopted from [39]). Left: three excitation waveforms. Right: corresponding speech waveform.

Fig. 3. (a) Source-filter model of speech production. (b) Extraction of excitation using inverse filtering. Fig. 4 shows the EGG signal in one glottal cycle. The EGG signal consists of four distinct phases [46]: the closing phase, the CP the opening phase, and the open phase. In the closing phase (between ¢, and ts), t contacting at the lower margins (b he vocal folds first start etween ¢; and te) and then moving the contact to the upper margins (between t2 and t3). Generally, the closing of t he vocal folds is faster than the opening, and the instant of the maximum slope occurs at t2, which can be seen as peak in the dEGG shown in Fig. 4( a prominent negative b). The vocal folds are in full contact during the CP (between tz and t4), blocking the passage of air through the g lottis. In the opening phase (between t, and te), the lower margins of the vocal folds begin to separate slowly from each other (between t4 and ts), followed by separation along the upper margins of the vocal folds (between t; and ts). The instant of the

Fig. 4. Segment of (a) EGG signal and (b) corresponding dEGG signal. Four parts of the glottal cycle are defined as follows: the closing phase (from t; and t3), the CP (from t3 and tz), the opening phase (from t, to tg), the open phase (from tg and t7), and the pitch period (from t; to t7) [44].

Fig. 5. Visualization of the closing and opening phases of the glottal cycle by simultaneous electroglottographic and high-speed recordings. Vertical bars to the EGG and dEGG signals indicate the moment in time at which the visual image occurs. The EGG sampling frequency is 44444 Hz, and the high-speed camera sampling frequency is 3704 frames/s (reproduced from [46] with permission of the publisher, the Acoustical Society of America).

Fig. 6. Computation of time-based and amplitude-based parameters from (a) glottal pulse and (b) its first time-derivative. The ac-flow (fac), minimum flow (f,,j,), and the minimum of the derivative (dpjn)-

Fig. 7. Illustration of some excitation features. (a) Speech signal. (b) dEGG signal. (c) LP residual. (d) Glottal flow derivative. (e) Instantaneous Fo.

Table 3 Trend in Prosody Features of Emotional Utterances With Respect to Neutral State Utterance (Increase: + and Decrease: |) [185], [186]

Table 2 Trend in Spectral Features of Emotional Utterances With Respect to Neutral State Utterance (Increase: t+ and Decrease: |) [185], [186]

descriptionView Paper arrow_downwardDownload

Discrimination between speech and music based on a low frequency modulation feature

by Stefan Karnebäck

2022

The possibility to discriminate between speech and music signals by using a feature based on low frequency modulation has been investigated. Three different low frequency modulation parameters have been extracted and tested concerning the... more

descriptionView Paper arrow_downwardDownload

PCA/LDA approach for text-independent speaker recognition

by Zhenhao Ge

2021

Various algorithms for text-independent speaker recognition have been developed through the decades, aiming to improve both accuracy and efficiency. This paper presents a novel PCA/LDA-based approach that is faster than traditional... more

descriptionView Paper arrow_downwardDownload

Techniques for Crosslingual Voice Conversion

by Anderson Fraiha Machado

2021, International Symposium on Multimedia

The cross lingual voice conversion problem refers to the replacement of a speaker's timbre or vocal identity in a recorded sentence, assuming that the source speaker and target speaker use different languages. This problem differs... more

descriptionView Paper arrow_downwardDownload

PCA/LDA approach for text-independent speaker recognition

by Zhenhao Ge

2021, Independent Component Analyses, Compressive Sampling, Wavelets, Neural Net, Biosystems, and Nanoengineering X

Various algorithms for text-independent speaker recognition have been developed through the decades, aiming to improve both accuracy and efficiency. This paper presents a novel PCA/LDA-based approach that is faster than traditional... more

descriptionView Paper arrow_downwardDownload

VOICE ACTIVITY DETECTION ANALYSIS

by Anirban Chakraborty

2020, International Research Journal of Modernization in Engineering Technology and Science(IRJMETS)

VAD is a reason for the trouble of discrimination between external noise and voice. VAD is an issue and for that reason various techniques have been suggested. Some are based upon power spectral density derived characteristics, and others... more

descriptionView Paper arrow_downwardDownload

Comparison of MFCC and pitch synchronous AM, FM parameters for speaker identification

by Jean Rouat

2016

We study robust pitch synchronous parameters that are derived from envelope and instantaneous frequencies estimated via a bank of cochlear filters. Closed set Speaker Identification experiments are performed on the SPIDRE corpus with... more

descriptionView Paper arrow_downwardDownload

Characterization of the voice source by the DCT for speaker information

by Ramakrishnan Angarai Ganesan

2015, M S thesis, I.I.Sc. Bangalore, India

In the source-filter model of speech production, physiologically, the source corresponds to the vocal fold vibrations and the filter corresponds to the spectrum-shaping vocal tract. Vocal tract-based features like the mel-frequency... more

In the source-filter model of speech production, physiologically, the source corresponds to the vocal fold vibrations and the filter corresponds to the spectrum-shaping vocal tract. Vocal tract-based features like the mel-frequency cepstral coefficients (MFCCs) have been shown to contain speaker information. However, voice source (VS)-based features have also been shown to perform well in speaker recognition tasks, thereby revealing that the VS does contain speaker information. Moreover, a combination of the vocal tract and VS-based features has been shown to give an improved performance, showing that the latter contains supplementary speaker information.
In this study, the existing techniques for extracting speaker information from the VS are reviewed, and it is observed that parametric features perform poorly than non-parametric features. Here, an attempt is made to propose an alternate way of characterizing the VS to extract speaker information, and to study the merits and shortcomings of the proposed speaker-specific features.
The integrated linear prediction residual (ILPR) is used as the VS estimate. It is hypothesized here that a speaker’s voice may be characterized by the relative proportions of the harmonics present in the VS. The pitch synchronous discrete cosine transform (DCT) is shown to capture these, and the gross shape of the ILPR in a few coefficients. The ILPR and hence its DCT coefficients are visually observed to have both inter and intra-speaker variability, and thus it is hypothesized that the distribution of the DCT coefficients may capture speaker information, and this distribution is modeled by a Gaussian mixture model (GMM).
The DCT coefficients of the ILPR (termed the DCTILPR) are directly used as a feature vector in speaker identification (SID) tasks. By conducting SID experiments on three standard databases, it is found that the proposed DCTILPR features fare comparably with the existing VS-based features. It is also found that the gross shape of the VS contains most of the speaker information, and the very fine structure of the VS does not help in distinguishing speakers, and instead leads to more confusion between speakers. The major drawbacks of the DCTILPR are the session and handset variability, but they are also present in the existing state-of-the-art speaker-specific VS-based features and the MFCCs, and hence seem to be common problems. There are techniques to compensate these variabilities, which need to be used when the systems using these features are deployed in an actual application.
The DCTILPR is found to improve the SID accuracy of a system trained with MFCC features by 12%, indicating that the DCTILPR features capture speaker information which is missed by the MFCCs. It is also found that a combination of MFCC and DCTILPR features on a speaker verification task gives significant performance improvement in the case of short test utterances. Thus, on the whole, this study proposes an alternate way of extracting speaker information from the VS, and adds to the evidence for speaker information present in the VS.

descriptionView Paper arrow_downwardDownload

Speaker Recognition: Advancements and Challenges

by Homayoon Beigi

2015, New Trends and Developments in Biometrics

Additional information is available at the end of the chapter

descriptionView Paper arrow_downwardDownload

A comparative evaluation of pitch modification techniques

by Thierry Dutoit

2013

ABSTRACT This paper addresses the problem of pitch modification, as an important module for an efficient voice transformation system. The Deterministic plus Stochastic Model of the residual signal we proposed in a previous work is... more

descriptionView Paper arrow_downwardDownload

Voice source cepstrum processing for speaker identification

by Jon Gudnason

2013

Abstract Voice source analysis and modelling has played a key role in important speech applications such as speech recognition, speech synthesis and speaker recognition. This work presents a robust algorithm for glottal closure detection... more

descriptionView Paper arrow_downwardDownload

Data-driven voice soruce waveform modelling

by Jon Gudnason

2013

Abstract This paper presents a data-driven approach to the modelling of voice source waveforms. The voice source is a signal that is estimated by inverse-filtering speech signals with an estimate of the vocal tract filter. It is used in... more

descriptionView Paper arrow_downwardDownload

Identifying robust and sensitive frequency bands for interrogating neural oscillations

by Alexander Shackman

2010

Recent years have seen an explosion of interest in using neural oscillations to characterize the mechanisms supporting cognition and emotion. Oftentimes, oscillatory activity is indexed by mean power density in predefined frequency bands.... more

Recent years have seen an explosion of interest in using neural oscillations to characterize the mechanisms supporting cognition and emotion. Oftentimes, oscillatory activity is indexed by mean power density in predefined frequency bands. Some investigators use broad bands originally defined by prominent surface features of the spectrum. Others rely on narrower bands originally defined by spectral factor analysis (SFA). Presently, the robustness and sensitivity of these competing band definitions remains unclear. Here, a Monte Carlo-based SFA strategy was used to decompose the tonic (“resting” or “spontaneous”) electroencephalogram (EEG) into five bands: delta (1–5 Hz), alpha-low (6–9 Hz), alpha-high (10–11 Hz), beta (12–19 Hz), and gamma (N21 Hz). This pattern was consistent across SFA methods, artifact correction/rejection procedures, scalp regions, and samples. Subsequent analyses revealed that SFA failed to deliver enhanced sensitivity; narrow alpha sub-bands proved no more sensitive than the classical broadband to individual differences in temperament or mean differences in task-induced activation. Other analyses suggested that residual ocular and muscular artifact was the dominant source of activity during quiescence in the delta and gamma bands. This was observed following threshold-based artifact rejection or independent component analysis (ICA)- based artifact correction, indicating that such procedures do not necessarily confer adequate protection. Collectively, these findings highlight the limitations of several commonly used EEG procedures and underscore the necessity of routinely performing exploratory data analyses, particularly data visualization, prior to hypothesis testing. They also suggest the potential benefits of using techniques other than SFA for interrogating high-dimensional EEG datasets in the frequency or time–frequency (event-related spectral perturbation, event-related synchronization/desynchronization) domains. KEY WORDS: principal components analysis (PCA); exploratory factor analysis (EFA); blind source separation (BSS); resting neural activity; resting EEG; frontal alpha asymmetry; frontal EEG asymmetry.

descriptionView Paper arrow_downwardDownload