Voice Conversion

description398 papers

group328 followers

lightbulbAbout this topic

Voice conversion is a technology and research field focused on transforming a source speaker's voice characteristics to sound like those of a target speaker, while preserving the linguistic content. It involves signal processing, machine learning, and speech synthesis techniques to manipulate vocal attributes such as pitch, timbre, and accent.

lightbulbAbout this topic

Key research themes

1. How can residual prediction techniques improve spectral detail and naturalness in voice conversion?

This theme investigates methods to predict or reconstruct the residual (excitation) signals in voice conversion frameworks, aiming to enhance the spectral details and naturalness of the converted speech. Since spectral envelope transformation alone often results in over-smoothed or synthetic-sounding speech, incorporating accurate residual prediction is critical. Research focuses on comparing residual prediction techniques, modeling the correlation between spectral features and residuals, and developing methods that better preserve speaker-dependent excitation characteristics.

A STUDY ON RESIDUAL PREDICTION TECHNIQUES FOR VOICE CONVERSION

by rohini dhorje

2016

Key finding: Compared several existing residual prediction methods and proposed a novel approach that predicts target residuals conditioned on converted spectral features using line spectral frequency-based transforms. Experimental... Read more

articleView Paper downloadDownload

OBSERVATION-MODEL ERROR COMPENSATION FOR ENHANCED SPECTRAL ENVELOPE TRANSFORMATION IN VOICE CONVERSION

by Fernando Villavicencio

2015

Key finding: Identified that probabilistic spectral envelope transformations based on Gaussian mixture models suffer from over-smoothing and modeling errors, leading to degraded speech quality. Proposed a novel deterministic spectral... Read more

articleView Paper downloadDownload

Enhancement of esophageal speech obtained by a voice conversion technique using time dilated Fourier cepstra

by Kais Ouni

2024, International Journal of Speech Technology

Key finding: Applied deep neural networks and Gaussian mixture models to convert esophageal speech features into laryngeal speech vocal tract features, using a voice conversion approach specifically accounting for pathological vocal tract... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What strategies enable non-parallel voice conversion by establishing frame-level or sequence-level mappings without shared parallel data?

Non-parallel voice conversion (VC) methods aim to build conversion systems without the need for parallel utterances of source and target speakers, which is significant for practical deployment. This research theme explores algorithms to discover correspondences between frames or segments of unaligned source and target speech through clustering, recognition, or iterative alignment methods. Approaches include DNN-HMM-based frame recognition, iterative nearest-neighbor alignment with temporal context, and latent space embeddings to create mapping functions, prioritizing alignment accuracy and quality of synthesized speech.

Mapping frames with DNN-HMM recognizer for non-parallel voice conversion

by Minghui Dong

2025, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Key finding: Introduced a method that uses a DNN-HMM speech recognizer to generate phoneme posterior (pseudo likelihood) vectors for source and target frames, enabling clustering and frame mapping without parallel data via similarity... Read more

articleView Paper downloadDownload

Non-Parallel Voice Conversion Using Joint Optimization of Alignment by Temporal Context and Spectral Distortion

by Hadas Benisty

2022

Key finding: Presented Temporal-Context INCA (TC-INCA), a generalization of the iterative nearest neighbor and conversion alignment (INCA) method, that incorporates temporal context vectors (sequences of features) rather than single... Read more

articleView Paper downloadDownload

Text-free non-parallel many-to-many voice conversion using normalising flows

by Magdalena Proszewska

2022, ArXiv

Key finding: Explored flow-based generative models to achieve non-parallel voice conversion without requiring text transcriptions or phonetic alignment by learning invertible, lossless encodings of speech spectrograms. Demonstrated... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can system fusion and hybrid methods enhance voice conversion performance by leveraging complementary strengths of distinct approaches?

This theme addresses the combination of multiple voice conversion techniques to harness their complementary advantages, such as statistical robustness, spectral detail preservation, and prosodic naturalness. By fusing systems like Gaussian mixture models (GMM) and frequency warping (FW), or exemplar-based and parametric methods, researchers can create hybrid frameworks that yield better speaker similarity and naturalness than individual methods. The research evaluates the feasibility, integration designs, and empirical gains in objective and subjective metrics.

System fusion for high-performance voice conversion

by Minghui Dong

2025, Interspeech 2015

Key finding: Proposed a system fusion framework combining Gaussian mixture model (GMM)-based statistical parametric and frequency warping (FW) based voice conversion methods to leverage GMM's modeling of spectral envelopes and FW's... Read more

articleView Paper downloadDownload

Fast locally linear embedding algorithm for exemplar-based voice conversion

by Yi-Wen Liu

2022

Key finding: Developed a fast locally linear embedding (LLE) algorithm for exemplar-based non-parametric VC, precomputing local clusters offline to reduce online matrix inversion complexity. The fast-LLE achieves comparable output quality... Read more

articleView Paper downloadDownload

A new multi-speaker formant synthesizer that applies voice conversion techniques

by Juan M Montero

2023, 7th European Conference on Speech Communication and Technology (Eurospeech 2001)

Key finding: Designed a multi-speaker parameter concatenation-based formant synthesizer using stored parameter sets and linear transformation functions to generate different speaker voices from a base set. The approach uses linear... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Voice Conversion

Cepstrum based Voice Transformation using ANN

by Mukesh A Zaveri SVNIT

2025

The basic goal of the voice conversion system to mimics the characteristics of the target speaker voice by keeping the linguistic and paralinguistic information intact. The characteristics of a speaker in speech reflect at different level... more

descriptionView Paper arrow_downwardDownload

Cepstrum based Voice Transformation using ANN

by Mukesh A Zaveri SVNIT

2025

descriptionView Paper arrow_downwardDownload

J.H.Nirmal; Suparva Patnaik ;Mukesh Zaveri

by Mukesh A Zaveri SVNIT

2025, Ijca Proceedings on International Conference in Computational Intelligence

descriptionView Paper arrow_downwardDownload

A PROSODY-DRIVEN EXTENSION TO EMBEDDING-GUIDED NEURAL VOICE CONVERSION FOR NATURAL AND EXPRESSIVE SPEECH SYNTHESIS

by A BALA RAJU

2025, Industrial Engineering Journal

Recent advancements in voice conversion systems have been largely driven by deep learning techniques, enabling the high-quality synthesis of human speech. However, existing models often fail to generate emotionally expressive speech,... more

descriptionView Paper arrow_downwardDownload

Lightweight, Multi-speaker, Multi-lingual Indic Text-To-Speech

by bira singh

2025, IEEE open journal of signal processing

The creation of the dataset has been supported by Deutsche Gesellschaft f ür Internationale Zusammenarbeit (GIZ) on behalf of the German Ministry for Economic Cooperation and Development.

descriptionView Paper arrow_downwardDownload

Real-time and non-real-time voice conversion systems with web interfaces

by Maxim Vashkevich

2025

Two speech processing systems have been developed for real-time and non-real-time voice conversion. Using the real-time processing the user can apply conversion during voice over IP (VoIP) calls imitating identity of a specified target... more

descriptionView Paper arrow_downwardDownload

Real-time pitch modification system for speech and singing voice

by Maxim Vashkevich

2025

A real-time pitch modification system has been developed. The implemented processing scheme is based on hybrid deterministic/stochastic decomposition of the signal and includes extraction of instantaneous pitch, pitch-synchronous... more

descriptionView Paper arrow_downwardDownload

Developing a controller pilot data link communication simulator

by manuel dias

2025

Radio frequencies for controller pilot communication are becoming a scarce resource due to increasing air traffic worldwide. The controller pilot data link communication (CPDLC) technology, which is already mandatory in new aircrafts,... more

descriptionView Paper arrow_downwardDownload

Amélioration de la conversion de voix chuchotée enregistrée par capteur NAM vers la voix audible

by Viet Anh Tran

2025

The NAM-to-speech conversion proposed by Toda and colleagues which converts Non-Audible Murmur (NAM) to audible speech by statistical mapping trained using aligned corpora is a very promising technique, but its performance is still... more

descriptionView Paper arrow_downwardDownload

Hybrid Concatenative Synthesis On The Intersection of Music and Speech

by Diemo Schwarz

2024

In this paper, we describe a concatenative synthesis sys- tem which was first designed for a realistic synthesis of melodic phrases. It has since been augmented to become an experimental TTS (Text-to-Speech) synthesizer. To- day, it is... more

descriptionView Paper arrow_downwardDownload

Spectral envelope estimation, representation, and morphing for sound analysis, transformation, and synthesis

by Diemo Schwarz

2024

Spectral envelopes are very useful in sound analysis and synthesis because of their connection with production and perception models, and their ability to capture and to manipulate important properties of sound using easily understandable... more

descriptionView Paper arrow_downwardDownload

Language independent on–off voice over IP source model with lognormal transitions

by Marc O. Eberhard

2024, Iet Communications

The recent explosive growth of voice over IP (VoIP) solutions calls for accurate modelling of VoIP traffic. This paper presents measurements of ON and OFF periods of VoIP activity from a significantly large database of VoIP call... more

descriptionView Paper arrow_downwardDownload

Language independent on–off voice over IP source model with lognormal transitions

by Marc O. Eberhard

2024, IET Communications

descriptionView Paper arrow_downwardDownload

HMM-based sCost quality control for unit selection speech synthesis

by Sathish Pammi

2024

This paper describes the implementation of a unit selection text-to-speech system that incorporates a statistical model Cost (sCost), in addition to target and join costs, for controlling the selection of unit candidates. sCost, a quality... more

descriptionView Paper arrow_downwardDownload

La generación de recursos para la mejora de la comprensión auditiva en griego antiguo mediante la inteligencia artificial

by Iván Andrés-Alba

2024, I Congreso en Innovación Docente de las Universidades Madrileñas: MadrID (Unidad de Apoyo a la Docencia, UAM)

Este trabajo explora el uso de inteligencia artificial en la creación de recursos para el ejercicio de la comprensión auditiva en la enseñanza del griego antiguo como lengua hablada. Partiendo de la importancia de la adquisición de esta... more

descriptionView Paper arrow_downwardDownload

Tc-star: Evaluation plan for voice conversion technology

by Harald Höge

2024

descriptionView Paper arrow_downwardDownload

Modification of harmonic peak-to-valley ratio for controlling roughness in voice conversion

by Anurag Verma

2024, Electronics Letters

A method for modifying voice quality attributes, i.e. breathiness and roughness, is presented in the context of voice conversion. Both breathiness and roughness of a speaker are collectively modelled by harmonic peak-to-valley ratio... more

descriptionView Paper arrow_downwardDownload

Enhancement of esophageal speech using voice conversion techniques

by Kais Ouni

2024, HAL (Le Centre pour la Communication Scientifique Directe)

This paper presents a novel approach for enhancing esophageal speech using voice conversion techniques. Esophageal speech (ES) is an alternative voice that allows a patient with no vocal cords to produce sounds after total laryngectomy:... more

descriptionView Paper arrow_downwardDownload

Estimation du pitch et décision de voisement par compression spectrale de l’autocorrélation du produit multi-échelle

by Mohamed Anouar BEN MESSAOUD

2024

descriptionView Paper arrow_downwardDownload

Reconnaissance Automatique des chiffres arabes en milieu réel par fusion audiovisuelle

by nadia bakir

2024

Dans cet article, nous présentons un système de Reconnaissance Automatique de la Parole (RAP) combinant les données acoustiques et les données visuelles. Ce système de reconnaissance audiovisuelle utilise comme moteur de reconnaissance... more

descriptionView Paper arrow_downwardDownload

Intelligibility Assessment of the De-Identified Speech Obtained Using Phoneme Recognition and Speech Synthesis Systems

by Simon Dobrisek

2024, Lecture Notes in Computer Science

The paper presents and evaluates a speaker de-identification technique using speech recognition and two speech synthesis techniques. The phoneme recognition system is built using HMM-based acoustical models of contextdependent diphone... more

descriptionView Paper arrow_downwardDownload

Explicit Modelling of State Duration Correlations in Hidden Markov Models

by Martin Russell

2024

Memorandum 4152 vibI1:,Cd JAvail 2nd/eor Dit Special. EXP)LICIT MODELLING OF STATE DURATION CORRELATIONS VN TMIDDEN MARKOV MODELS-5 M J Russell and L Siine September 1988 ABSTRACT4 fn recent years considerable effort has been directed... more

descriptionView Paper arrow_downwardDownload

Classification d'Alzheimer à partir de paramètres acoustiques et prosodique avec de l'apprentissage automatique

by Jalal Al-Tamimi

2024

HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or... more

descriptionView Paper arrow_downwardDownload

Improvement to a NAM captured whisper-to-speech system

by Hélène Loevenbruck

2024

In this paper, new techniques to improve whisper-to-speech conversion are investigated, in the framework of silent speech telephone communication. A preliminary conversion method from Non-Audible Murmur (NAM) to modal speech, based on... more

descriptionView Paper arrow_downwardDownload

Adaptive Frequency Warping for Improved Spectral Modeling

by Preeti Rao

2024

The compact representation of harmonic amplitudes in the sinusoidal coding of voiced speech is often achieved by the all-pole modeling of a spectral envelope. The perceptual accuracy of the representation may be enhanced by the use of... more

descriptionView Paper arrow_downwardDownload

On the application of RLS adaptive filtering for voice pitch modification

by Luiz Biscainho

2024, Proceedings of the 10th …

This paper presents a pitch modification scheme, based on the recursive least-squares (RLS) adaptive algorithm, for speech and singing voice signals. The RLS filter is used to determine the linear prediction (LP) model on a... more

descriptionView Paper arrow_downwardDownload

Stockage de données

by mahfoud Elfagrich

2024, chapitre

Dans le domaine scientifique, le stockage et la gestion efficaces des données sont cruciaux pour la recherche, l'analyse et la collaboration. Différents formats de fichiers ont été développés pour répondre aux besoins spécifiques des... more

descriptionView Paper arrow_downwardDownload

Enhancement of esophageal speech obtained by a voice conversion technique using time dilated Fourier cepstra

by Kais Ouni

2024, International Journal of Speech Technology

descriptionView Paper arrow_downwardDownload

Challenges in Speech Synthesis

by Harald Höge

2024, Springer eBooks

Table 2.1 Evolution of number of participants in speech synthesis challenges

descriptionView Paper arrow_downwardDownload

Subjective evaluation of an emotional speech database for Basque

by Jon Sanchez

2024, Proc. 6th …

This paper describes the evaluation process of an emotional speech database recorded for standard Basque in order to determine its adequacy for the analysis of emotional models and its use in speech synthesis. The corpus consists of seven... more

descriptionView Paper arrow_downwardDownload

Voice Transformation Algorithms With Real Time Dsp Rapid Prototyping Tools

by graziano bertini

2024

Publication in the conference proceedings of EUSIPCO, Antalya, Turkey, 2005

descriptionView Paper arrow_downwardDownload

Novel speech duration modifier for packet based communication system

by Senthil Mani

2024, Interspeech 2014

In this paper, we propose a real-time method for duration modification of speech for packet based communication system. While there is rich literature available on duration modification, it fails to clearly address the issues in real-time... more

descriptionView Paper arrow_downwardDownload

Evaluation of voice codecs for the Australian mobile satellite system

by Malcolm Wilkinson

2024

The evaluation procedure to choose a low bit rate voice coding algorithm is described for the Australian land mobile satellite system. The procedure is designed to assess both the inherent quality of the codec under 'normal'... more

descriptionView Paper arrow_downwardDownload

A Holistic Glottal Phase-Related Feature

by Aníbal Ferreira

2024

This paper addresses a phase-related feature that is time-shift invariant, and that expresses the relative phases of all harmonics with respect to that of the fundamental frequency. We identify the feature as Normalized Relative Delay... more

descriptionView Paper arrow_downwardDownload

Static Features in Isolated Vowel Recognition at High Pitch

by Aníbal Ferreira

2024, Proceedings of the International Conference on Signal Processing and Multimedia Applications

Vowel recognition is frequently based on Linear Prediction (LP) analysis and formant estimation techniques. However, the performance of these techniques decreases in the case of female or child speech because at high pitch frequencies... more

descriptionView Paper arrow_downwardDownload

Residual prediction

by Harald Höge

2024

Residual prediction is a technique that aims at recovering the spectral details of speech that was encoded using parameterizations as linear predictive coefficients. Example applications of residual prediction are hidden Markov modelbased... more

descriptionView Paper arrow_downwardDownload

Residual prediction

by Harald Höge

2024, Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005.

descriptionView Paper arrow_downwardDownload

On quality of experience (QoE) for multimedia services in communication ecosystem

by Khalil Ahmed laghari

2024

1. 4. Contribution de la Thèse Les contributions de cette thèse pourraient être divisées en trois parties.

Figure 1: Diagramme conceptuel de QoE et facteurs d'influence entre l'humain, la technologie, des affaires et le contexte ainsi que leur effet sur la QoE.

Figure 31(b). Subjective QoE vs Virtual Room Size

Table 25: Comparision of different Video Quality tools with QOM

QoE Framework Following Figure 38 shows the record of QoE and QoS information.

Q1 Par rapport au fonctionnement de votre téléphone par internet / ligne fixe, vous diriez que vous étes:

1. Proposer un modéle holistique QoE: Dans la premiére étape, il est important de comprendre

ariables de résultats [67] [49] [48]. De méme, nous divisons tous les facteurs en trois catégorie:

Figure 4. Interaction interdomaine rincipales: i) les facteurs de prédiction (ii) les facteurs résultats et les facteurs de modération (iii)

Figure 1: Conceptual diagram of QoE and Influencing factors interaction between human, technology, business and context as well as their effect on QoE.

The contributions of this thesis could be broadly divided into three parts as show in

human behavior. REIDY BOY MEM AY BEV ARR NONE AER UE DEERE AME MINI LOUGLIN IIS ON influence of various factors (attitude, social norms, PBC) over human behavior. Wanmin Wu et al. in [52] propose the use of TAM model as QoE construct in distributed interactive multimedia environment. In [53] author uses a TAM model for pervasive computing to understand human behavior towards adoption of pervasive computing. Psychological models such as TAM, TPB, and UTAUT are more focused on understanding human external factors (social norms) and control factors (price, complexity of system etc) to get precise

Figure 5: Proposed High level Communication Eco system technology, business, context and human. and non living (technology, and business) in a particular context”. A conceptual diagram of < interaction represents in-service user experience. Various technological aspects such as servic

Figure 6: High Level Diagram for QoE Interaction in Communication Ecosystem three levels of abstraction: entity, roles, and attributes/characteristics. An entity is a real-world

User annoyance is not solely caused by decline in QoS of a VoD service. For example.

requirements, as shown in Figure 9. Time dependence and symmetry are two important aspects on understanding the categories Multimedia services can be subdivided based on their temporal and data symmetry of multimedia services.

of QoE as given in Figure 10 4.2.1. Subjective Assessment Method:

categories the ratio of positive to negative comments and produces results in histogram format: QoE relationship based on both quantitative and qualitative assessment.

parametric statistics is used for interval scales, thus the use of Pearson correlation

to be achieved in order to provide very satisfactory experience to user as shown in Figure 13. 2.3. Objective Assessment Techniques for Objective QoE Factors The application of psychological models for understanding human intentions and behaviot Viean Opinion Score (MOS).The E-model is combines a number of different impairments to

this study to see if different content types produce similar or different user experience. Figure 14: Research Model of Video Streaming Study Kesearch model presents prediction, outcome and moderation factors (refer rigure 14). In this user study, the prediction factors or influencing factors are three Network QoS (NQoS) parameters and one Application QoS (AQoS) parameter. The objective is to assess the combined effect of prediction factors over perceived video quality. Perceived Video Quality (PVQ) is QoE metric which represents user perception about the quality of a video clip. During user study, this QoE factor is collected on the basis of user ratings/scores (quantitative process) and user comments (qualitative process). The type of video content is considered as moderation factor in Py ee a a: ey nn. ce Rl b,j, i ny i a ec ne Ce fs i eee, tee

were arranged in following categories. 2. Neutral Comments: Neutral comments reflect neutral opinions such as “Normal

Figure 24: Classification of Call Setup Faults Incoming Call Setup Faults (SCF) and their classifications are given below (see Figure 24). There are two types of call setup faults (1) Outgoing Call Setup Faults (OCSF) anc

on delay and the strength of the reflected signals. In analogue PSTN telephony, the main source

Figure 26: (a). Impairment Scale for often category (b) for never category 6.4.2.2 Customer preferences

Figure 27: Customer Preferences based on Customer Telephone Handset

supplying local virtual environments and modifying the position of the local as well as of the nhanced by a plug-in to control the virtual environment in order to support QoE requirements.

QoE with 3D virtual acoustic environment as shown in Figure 29. To evaluate 3D Audio Teleconferencing tool and analyze the impact of virtual acoustic environment over users, we present a research model for 3D Telephony. From high level QoE interaction model (chapter 3), we see the interaction between QoE-&-technology, QoE-&- business and QoE-&-context domains. In the current contribution, we concentrate to study the relationship between QoE-&-context domains. More specifically we try to compare and correlate QNAD xewxrith QT <criaetiial annnirntin nnrnvennmant aa nhanrmn sen TD nem VW

system. computer using a specially designed user interface (as shown in Figure 30) on Linux operati recommendations [77] as far as possible. All tests were conducted in a quiet listening room on a

yerformance but it has the lowest localization easiness, spatial audio quality and overall

3 Experiment IIT: QoE Moderation based on Gender values with mixed gender voice type since both voices can be distinguished more easily than the In experiment 7.5.2.1, It was discussed that how the size of virtual room could impact various Results show that both localization performance and localization easiness obtain the highest

From Table 23, values for x (successful participants) and n (total numbers of participants) are Table 23: VAE and successful & unsuccessful gender groups of test participants

room bring no considerable difference in their localization performance. participants’ perform better localization in middle size room (15 m3), while the small and big size

Figure 34: LE for male and female participants in virtual acoustic rooms female participants also keep different perceptual levels. Localization Performance vs. Localization Easiness:

Figure 35: Basic Functions of QOM degradation in QoE score.

components of the framework. provides the VoD service over the client web browser. All the objects of the framework ar A web-based client interface is developed to facilitate users to watch videos online and giv

Figure 37: Screen shot for Web based client Interface number. And S/he has to insert client IP address before processing video test. A client comes to the VoD service web page and chooses the video contents and it

Figure 39: Database structure comments. User information will be stored in the database for further analysis.

Figure 44: Architecture of the Android application for QOE measurement interaction with the end-user and with the remote multimedia service provider (MSP). The Manager (MC) is the main component responsible for interacting with the outside world (user and service provider) and managing rest of the system components.

Figure 45: Screenshots from the QoE measurement application

Figure 46: Cost function graphs of two methods

Experiment 1: Pls give your opinion about current video experience. (In English or

Mes papiers IV, V, VI et VII sont mes contributions liées a ce travail. Tableau 1: Résumé de la phase 2 de contribution

Table 3: Comparison of various disciplines w.r.t QoE

Table 5: Comparison between our proposal models with respect to other models main points in order to compare them with our proposed QoE frame work.

Table 6: QoS for Multimedia Services ITU-T G.1010)

same across the levels of the variables.

Table 9: Human Physiological and Cognitive factors cognitive estimates are presented in [106] and they are summarized in Table 9.

Table 10: Response time of webpage download and Telephony vs. Human reaction accomplish within duration ¢ and once they finish their task, their performance is analyzed on the

Table 11: Test Setup Table In this section we provide quantitative and qualitative assessment of user study and discuss

Table 12: Experimental Data (Raw Decision Table) decision attributes describe the user scores.

Table 13: Discretized Data Table Step 2: Classify and reduce attribute set. The second step is to reduce and classify data.

Table 14: Discretized Data Table Step 2. Classify and reduce attribute set. For football video, using RST, a core set was found to be Core= {Packet Loss, Video Bit rate} as show in Figure 18. It means packet loss and video bit rate are two key influencing factors, while delay and packet reordering don’t have significant impact on user perceived video quality. reduced into three levels (1.e., 3= User Acceptance, 2=Normal/Fair, 1=User Rejection). Table 14

Table 15: Comparison between PSTN and VoIP STN telephony was designed to offer real time telephony service, however internet only

Table 16: Perceived Availability versus Call Setup Faults Table 16 shows the customer data result for call signaling quality.

Table 17: Perceived Call Quality versus Technical Faults any technical issue. Customer’s perception about call quality is affected by various technical anc

Table 18: Comparative Analysis of PSTN and VoIP Faults

Table 19: Customer Preferences based on Customer Handset failure. Preference 2 is having smooth voice quality; Preference 3 is HD ring tone.

Table 20: Overall Satisfaction differentiated based on Customer Age

Table 24: Analysis of Human QoE Factors in relation to Virtual Acoustic Environment performance increases for male participants as the size of virtual room decreases. The overall

descriptionView Paper arrow_downwardDownload

An empirical study of delay jitter management policies This work was supported by the National Science Foundation (grant numbers CCR-9110938 and ICI-9015443), and by the IBM and Intel Corporations

by Kevin Jeffay

2024, Elsevier eBooks

This paper presents an empirical study of several policies for managing the effect of delay jitter on the playout of audio and video in computer-based conferences. The problem addressed is that of managing the fundamental tradeoff between... more

descriptionView Paper arrow_downwardDownload

Real-time Video Quality Assessment

by Gerardo Rubino

2024

There is a great demand to assess video quality transmitted in real time over packet networks, and to make this assessment in real time too. Quality assessment is achieved using two types of methods: objective or subjective. Subjective... more

descriptionView Paper arrow_downwardDownload

On-line signature verification using Gaussian Mixture Models and small-sample learning strategies

by Julian Arias-Londoño

2024, Revista Facultad de Ingeniería Universidad de Antioquia

descriptionView Paper arrow_downwardDownload

Parallel Reference Speaker Weighting for Kinematic-Independent Acoustic-to-Articulatory Inversion

by An Ji

2024, IEEE/ACM Transactions on Audio, Speech, and Language Processing

This paper is NOT THE PUBLISHED VERSION; but the author's final, peer-reviewed manuscript. The published version may be accessed by following the link in th citation below.

descriptionView Paper arrow_downwardDownload

Non-linear dictionary representation of deep features for face recognition from a single sample per person

by MOHAMMED OUANAN

2024, Procedia Computer Science

Unconstrained face recognition remain a challenging problem due to intra-class variations caused by occlusion, disguise, varying orientations, facial expressions, age variations and illumination in real circumstances...etc. the... more

descriptionView Paper arrow_downwardDownload

Quality assessment of voice converted speech using articulatory features

by MOHAMMADI ZAKI

2024, arXiv (Cornell University)

We propose a novel application based on acoustic-toarticulatory inversion (AAI) towards quality assessment of voice converted speech. The ability of humans to speak effortlessly requires coordinated movements of various articulators,... more

descriptionView Paper arrow_downwardDownload

Power and frequency efficient virtual cellular network

by 文幸安達

2024, The 57th IEEE Semiannual Vehicular Technology Conference, 2003. VTC 2003-Spring.

Recently, major services provided by mobile communications systems are shifting from voice conversations to data communications over the Internet. There is a strong demand for increasing the data transmission rate. However, an important... more

descriptionView Paper arrow_downwardDownload

Design of Voice Privacy System using Linear Prediction

by Madhu Kamble

2024, 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

Speaker’s identity is the most crucial information exploited (implicitly) by an Automatic Speaker Verification (ASV) system. Numerous attacks can be obliterated simultaneously if privacy preservation is exercised for a speaker’s identity.... more

descriptionView Paper arrow_downwardDownload

Introduction to the Special Issue on Intrinsic Speech Variations

by Christian Wellekens

2024, Speech Communication

descriptionView Paper arrow_downwardDownload

A new multi-speaker formant synthesizer that applies voice conversion techniques

by José Luís Vallejo

2024, 7th European Conference on Speech Communication and Technology (Eurospeech 2001)

We present a multi-speaker formant synthesizer based on parameter concatenation. The user can choose among three speakers, two males and one female. The synthesizer stores all the parameters for the basic speaker and linear transformation... more

descriptionView Paper arrow_downwardDownload

Prosody Modifications for Voice Conversion

by jitendra dhiman

2024

Generally defined, speech modification is the process of changing certain perceptual properties of speech while leaving other properties unchanged. Among the many types of speech information that may be altered are rate of articulation, pitch and formant characteristics.Modifying the speech parameters like pitch, duration and strength of excitation by desired factor is termed as prosody modification. In this thesis prosody modifications for voice conversion framework are presented.Among all the speech modifications for prosody two things are important firstly modification of duartion and pauses (Time scale modification) in a speech utterance and secondly modification of the pitch(pitch scale modification).Prosody modification involves changing the pitch and duration of speech without affecting the message and naturalness.In this work time scale and pitch scale modifications of speech are discussed using two methods Time Domain Pitch Synchronous Overlapped-Add (TD-PSOLA) and epoch based approach.In order to apply desired speech modifications TD-PSOLA discussed in this thesis works directly on speech in time domian although there are many variations of TD-PSOLA.The epoch based approach involves modifications of LP-residual. Among the various perceptual properties of speech pitch contour plays a key role which defines the intonation patterns of speaker.Prosody modifications of speech in voice conversion framework involve modification of source pitch contour as per the pitch contour of target.In a voice conversion framework it requires prediction of target pitch contour. Mean/ variance method for pitch contour prediction is explored. Sinusoidal modeling has been successfully applied to a broad range of speech processing problems. It offers advantages over linear predictive modeling and the short-time Fourier transform for speech analysis/ synthesis and modification. The parameter estimation of sinusoidal modeling which permits flexible time and frequency scale voice modifications is presented. Speech synthesis using three models sinusoidal, harmonic and harmonic-plus-residual is discussed. vi

descriptionView Paper arrow_downwardDownload

Prosody Modifications for Voice Conversion

by jitendra dhiman

2024

descriptionView Paper arrow_downwardDownload