Papers by Yannis Agiomyrgiannakis
Coding with Side Information Techniques for LSF Reconstruction in Voice Over IP
Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005., 2000
... [4] Martin R., C. Hoelper, and I. Wittke, Estimation of miss-ing lsf parameters using gaussi... more ... [4] Martin R., C. Hoelper, and I. Wittke, Estimation of miss-ing lsf parameters using gaussian mixture models, in Proc. ... [9] Y. Stylianou, O. Cappe, and Eric Moulines, Continuous propabilistic transform for voice conversion, IEEE Trans. ... [11] Yannis Agiomyrgiannakis and Yannis ...
Voice Morphing that improves TTS quality using an optimal dynamic frequency warping-and-weighting transform
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016
Towards flexible speech coding for speech synthesis: an LF + modulated noise vocoder
Interspeech, 2008
The harmonic model codec (HMC) framework for voIP
Interspeech, 2007

2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004
This paper addresses the problem of expanding the bandwidth of narrowband speech signals focusing... more This paper addresses the problem of expanding the bandwidth of narrowband speech signals focusing on the estimation of highband spectral envelopes. It is well known that there is not enough mutual information between the two bands. We show that this is happening because narrowband spectral envelopes have an one-to-many relationship with highband spectral envelopes. A combined estimation/coding scheme for the missing spectral envelope is proposed, which employs this relationship to produce a high quality highband reconstruction, provided that there is an appropriate excitation. Subjective tests using the TIMIT database indicate that 134 bits/sec for highband spectral envelope are adequate for a DCR score of 4.41. This is an improvement of 22.8% over a typical estimation of highband envelopes using usual mapping functions, in terms of DCR score.

2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07, 2007
Voice conversion methods have the objective of transforming speech spoken by a particular source ... more Voice conversion methods have the objective of transforming speech spoken by a particular source speaker, so that it sounds as if spoken by a different target speaker. The majority of voice conversion methods is based on transforming the short-time spectral envelope of the source speaker, based on derived correspondences between the source and target vectors using training speech data from both speakers. These correspondences are usually obtained by segmenting the spectral vectors of one or both speakers into clusters, using soft (GMM-based) or hard (VQ-based) clustering. Here, we propose that voice conversion performance can be improved by taking advantage of the fact that often the relationship between the source and target vectors is one-to-many. In order to illustrate this, we propose that a VQ approach namely constrained vector quantization (CVQ), can be used for voice conversion. Results indicate that indeed such a relationship between the source and target data exists and can be exploited by following a CVQ-based function for voice conversion.
Stochastic Modeling and Quantization of Harmonic Phases in Speech using Wrapped Gaussian Mixture Models
2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07, 2007
Harmonic sinusoidal representations of speech have proven to be useful in many speech processing ... more Harmonic sinusoidal representations of speech have proven to be useful in many speech processing tasks. This work focuses on the phase spectra of the harmonics and provides a methodology to analyze and subsequently to model the statistics of the harmonic phases. To do so, we propose the use of a wrapped Gaussian mixture model (WGMM), a model suitable for random

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011
Spectral envelopes of speech signals are typically obtained by making stationarity assumptions ab... more Spectral envelopes of speech signals are typically obtained by making stationarity assumptions about the signal which are not always valid. The Adaptive Quasi-Harmonic Model (AQHM), a non-stationary signal model, is capable of capturing the time-varying quasiharmonics in voiced speech. This paper suggests the use of AQHM in a multi-layer scheme which results in a high-resolution time-frequency representation of speech. This representation is then used for the recovery of the evolving spectral envelope and thus, a time-frequency spectral envelope estimation algorithm is introduced related to the Papoulis-Gerchberg algorithm for data extrapolation. Results on voiced speech sounds show that the estimated spectral envelopes are smoother than those estimated by state-of-the-art spectral envelope estimators, while maintaining the important spectral details of the speech spectrum.
2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009
Two ARX-LF-based source/filters models for speech signals are presented. A robust glottal inversi... more Two ARX-LF-based source/filters models for speech signals are presented. A robust glottal inversion technique is used to deconvolve the signal into an excitation component and a filter component. The excitation component is further decomposed into an LF part and a residual part. The first model, referred to as the LF-vocoder, is a high quality vocoder that replaces the residual part with modulated noise. The second model uses a sinusoidal harmonic representation of the residual signal. The latter does not degrade the signal during analysis/synthesis and provides higher quality for small modification factors, while the former has the advantage of being a compact, fully parametric representation that is suitable for low-bit-rate speech coding as well as parametric speech synthesis applications.

Sinusoidal coding of speech for voice over IP
ABSTRACT It is widely accepted that Voice-over-Internet-Protocol (VoIP) will dominate wireless an... more ABSTRACT It is widely accepted that Voice-over-Internet-Protocol (VoIP) will dominate wireless and wireline voice communications in the near future. Traditionally, a minimum level of Quality-of-Service is achieved by careful traffic monitoring and network fine-tuning. However, this solution is not feasible when there is no possibility of controlling/monitoring the parameters of the network. For example, when speech traffic is routed through Internet there are increased packet losses due to network delays and the strict end-to-end delay requirements for voice communication. Most of today's speech codecs were not initially designed to cope with such conditions. One solution is to introduce channel coding at the expense of end-to-end delay. Another solution is to perform joint source/channel coding of speech by designing speech codecs which are natively robust to increased packet losses. This thesis proposes a framework for developing speech codecs which are robust to packet losses. The thesis addresses the problem in two levels: at the basic source/channel coding level where novel methods are proposed for introducing controlled redundancy into the bitstream, and at the signal representation/coding level where a novel speech parameterization/modeling is presented that is amenable to efficient quantization using the proposed source coding methods. The speech codec is designed to facilitate high-quality Packet Loss Concealment (PLC). Speech signal is modeled with harmonically related sinusoids; a representation that enables fine time-frequency resolution which is vital for high-quality PLC. Furthermore, each packet is encoded independently of the previous packets in order to avoid a desynchronization between the encoder and the decoder upon a packet loss. This allows some redundancy to exist in the bit-stream. A number of contributions are made to well-known harmonic speech models. A fast analysis/synthesis method is proposed and used in the construction of an Analysis-by-Synthesis (AbS) pitch detector. Harmonic Codecs tend to rely on phase models for the reconstruction of the harmonic phases, introducing artifacts that effect the quality of the reconstructed speech signal. For a high-quality speech reconstruction, the quantization of phase is required. Unfortunately, phase quantization is not a trivial problem because phases are circular variables. A novel phase-quantization algorithm is proposed to address this problem. Harmonics phases are properly aligned and modeled with a Wrapped Gaussian Mixture Model (WGMM) capable of handling parameters that belong to circular spaces. The WGMM is estimated with a suitable Expectation-Maximization (EM) algorithm. Phases are then quantized by extending the efficient GMM-based quantization techniques for linear spaces to WGMM and circular spaces. When packet losses are increased, additional redundancy can be introduced using Multiple Description Coding (MDC). In MDC, each frame is encoded in two descriptions; receiving both descriptions provides a high-quality reconstruction while receiving one description provides a lower-quality reconstruction. With current GMM-based MDC schemes it is possible to quantize the amplitudes of the harmonics which represent an important portion of the information of the speech signal. A novel WGMM-based MDC scheme is proposed and used for MDC of the harmonic phases. It is shown that it is possible to construct high-quality MDC codecs based on harmonic models. Furthermore, it is shown that the redundancy between the MDC descriptions can be used to "correct" bit errors that may have occurred during transmission. At the source coding level, a scheme for /Multiple Description Transform Coding/ (MDTC) of multivariate Gaussian using Parseval Frame expansions and a source coding technique referred to as /Conditional Vector Quantization/ (CVQ), are proposed. The MDTC algorithm is extended to generic sources that can be modeled with GMM. The proposed frame facilitates a computationally efficient /Optimal Consistent Reconstruction/ algorithm (OCR) and /Cooperative Encoding/ (CE). In CE, the two MDTC encoders cooperate in order to provide better central/side distortion tradeoffs. The proposed scheme provides scalability, low complexity and storage requirements, excellent performance in low redundancies and competitive performance in high redundancies. In CVQ, the focus is given in correcting the most frequent type of errors; single and double packet losses. Furthermore, CVQ finds application to ΒandWidth Expansion (BWE), the extension of the bandwidth of narrowband speech to wideband. Concluding, two /proof-of-concept/ harmonic codecs are constructed, a single description and a multiple description codec. Both codecs are narrowband, variable rate, similar to quality with the state-of-the-art iLBC (internet Low Bit-Rate Codec) under perfect channel conditions and better than iLBC when packet losses occur. The single description codec…

A frequency-weighted post-filtering transform for compensation of the over-smoothing effect in HMM-based speech synthesis
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014
ABSTRACT Over-smoothing is one of the major sources of quality degradation in statistical paramet... more ABSTRACT Over-smoothing is one of the major sources of quality degradation in statistical parametric speech synthesis. Many methods have been proposed to compensate over-smoothing with the speech parameter generation algorithm considering Global Variance (GV) being one of the most successfull. This paper models over-smoothing as a radial relocation of poles and zeros of the spectral envelope towards the origin of the z-plane and uses radial scaling to enhance spectral peaks and to deepen spectral valeys. The radial scaling technique is improved by introducing over-emphasis, spectral-tilt compensation and frequency weighting. Listening test results indicate that the proposed method is 11%-13% more preferable than GV while it has less algorithmic delay (only 5 ms) and computational complexity.
Fast Analysis/Synthesis of Harmonic Signals
2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006
... Miltiadis Vasilakis, Yannis Agiomyrgiannakis and Yannis Stylianou ... Science, University of ... more ... Miltiadis Vasilakis, Yannis Agiomyrgiannakis and Yannis Stylianou ... Science, University of Crete, Hellas and Institute of Computer Science, FORTH {mvasilak, jagiom, yannis}@csd.uoc ... patterns that appear inside the HM inverse cosine and sine correlation matrices (A −1 c and A ...

IEEE Transactions on Audio, Speech and Language Processing, 2000
In many speech-coding-related problems, there is available information and lost information that ... more In many speech-coding-related problems, there is available information and lost information that must be recovered. When there is significant correlation between the available and the lost information source, coding with side information (CSI) can be used to benefit from the mutual information between the two sources. In this paper, we consider CSI as a special VQ problem which will be referred to as conditional vector quantization (CVQ). A fast two-step divide-and-conquer solution is proposed. CVQ is then used in two applications: the recovery of highband (4-8 kHz) spectral envelopes for speech spectrum expansion and the recovery of lost narrowband spectral envelopes for voice over IP. Comparisons with alternative approaches like estimation and simple VQ-based schemes show that CVQ provides significant distortion reductions at very low bit rates. Subjective evaluations indicate that CVQ provides noticeable perceptual improvements over the alternative approaches.

IEEE Transactions on Audio, Speech, and Language Processing, 2000
The harmonic representation of speech signals has found many applications in speech processing. T... more The harmonic representation of speech signals has found many applications in speech processing. This paper presents a novel statistical approach to model the behavior of harmonic phases. Phase information is decomposed into three parts: a minimum phase part, a translation term, and a residual term referred to as dispersion phase. Dispersion phases are modeled by wrapped Gaussian mixture models (WGMMs) using an expectation-maximization algorithm suitable for circular vector data. A multivariate WGMM-based phase quantizer is then proposed and constructed using novel scalar quantizers for circular random variables. The proposed phase modeling and quantization scheme is evaluated in the context of a narrowband harmonic representation of speech. Results indicate that it is possible to construct a variable-rate harmonic codec that is equivalent to iLBC at approximately 13 kbps.
Towards flexible speech coding for speech synthesis: an LF+ modulated noise vocoder
Ninth Annual Conference of the …, 2008
... Orange Labs, TECH-SSTP-VMI {yannis.agiomyrgiannakis, olivier.rosec}@orange-ftgroup.com ... t ... more ... Orange Labs, TECH-SSTP-VMI {yannis.agiomyrgiannakis, olivier.rosec}@orange-ftgroup.com ... t c open phase ... [5] Y. Stylianou, Harmonic-plus-noise Models for speech, combined with statistical methods for speech and speaker modification, Ph.D. dissertation, Ecole Nationale. ...
Systems and Methods for Three-Dimensional Audio CAPTCHA
Uploads
Papers by Yannis Agiomyrgiannakis