Papers by Christophe Savariaux

Journal of Phonetics, 2019
This study investigates how L1 and L2 speakers of French produce phonetic correlates of French pr... more This study investigates how L1 and L2 speakers of French produce phonetic correlates of French prosodic structure, specifically the properties of Accentual Phrases that are evidenced in dimensions other than f0. The L2 speakers had English L1, with varying levels of proficiency in French. We also examined the same individuals' productions of sentences in English. Differences in prosodic structure between English and French lead us to expect differences between the two groups of speakers. Our study measured jaw displacement in electromagnetic articulography, as well as acoustic duration and vowel formant values. Patterns of variation across the syllables of the stimulus sentences were evaluated by comparing normalized values for individual syllables; significance testing was not used as it could not capture the relatively small syllable-to-syllable differences. In French, we found that despite substantial individual variation, the L1 speakers generally show a tendency for expanded articulation (greater jaw displacement, and F1 values corresponding to this), and longer durations on syllables that are final in an Accentual Phrase (identified using f0 cues). The most obvious differences in the L2 speakers' productions were seen in polysyllabic words, particularly cognates, where less advanced speakers tended to produce expanded articulations on syllables that would receive lexical stress in English but no accentuation in French. For English, limited data show greater consistency among the L2 (native French) speakers, possibly due to them feeling less able to exploit the variability in prominence placement that English allows.

Http Www Theses Fr, 1995
Étude de l'espace de contrôle distal en production de la parole : les enseignements d'une perturb... more Étude de l'espace de contrôle distal en production de la parole : les enseignements d'une perturbation à l'aide d'un tube labial. Christophe SAVARIAUX Résumé Cette thèse s'inscrit dans le cadre d'un projet à long terme de l'équipe articulatoire de l'ICP dont le but est de réaliser et de contrôler un robot anthropomorphique capable de produire de la parole selon des processus simulant les stratégies d'un locuteur humain. Dans un tel projet, une étape importante consiste à comprendre comment la tâche est spécifiée pour le locuteur et plus précisément quel est l'espace de contrôle distal (acoustique ou articulatoire). Pour déterminer les poids respectifs de ces deux espaces dans le contrôle de la production des voyelles, nous avons choisi une approche expérimentale s'appuyant sur une perturbation des stratégies habituelles de production. Pour cela nous avons inséré entre les lèvres de 11 locuteurs, un tube labial de 20 mm de diamètre imposant ainsi une grande aire labiale et nous leur avons demandé de produire dans ces conditions la voyelle [u], usuellement produite avec une faible ouverture labiale. Nous avons alors recueilli des données articulatoires (rayons X) et acoustiques (signal sonore produit). Les stratégies développées pour faire face à la perturbation ont été ainsi analysées, dans un premier temps sur la base d'une description acoustique à l'aide du patron formantique F1/F2 et ensuite à l'aide des résultats issus de tests perceptifs réalisés pour évaluer la qualité perceptive des sons produits. Les résultats ont confirmé que la compensation était possible grâce à une complète réorganisation articulatoire. Nous avons ensuite pu mettre en évidence que le comportement des locuteurs a été fortement basé et contrôlé en vue d'une amélioration du résultat acoustique : la cible auditive est donc clairement présente dans la tâche du locuteur. Finalement, nous avons proposé d'intégrer ces résultats dans un schéma général du contrôle de la production de la parole.
Nasal diphthongs are quite rare in the world's languages. This paper analyzes how speakers contro... more Nasal diphthongs are quite rare in the world's languages. This paper analyzes how speakers control articulatory movements for nasal diphthongs in Brazilian Portuguese (BP). Our aim is to characterize the oral-nasal coupling in posterior nasal diphthongs from the Paulistano dialect spoken in the city of Sao Paulo. We show that oral and nasal diphthongs have different tongue contours, besides velopharyngeal coupling. A 2D EMA study was carried out to contrast [aw] and [ãw̃] in monosyllabic words.

Recent progress in the INRS speech recognition system
The Journal of the Acoustical Society of America, 1997
The INRS large‐vocabulary continuous‐speech recognition system employs a two‐pass search. First, ... more The INRS large‐vocabulary continuous‐speech recognition system employs a two‐pass search. First, inexpensive models prune the search space; then a powerful language model and detailed acoustic–phonetic models scrutinize the data. A new fast match with two‐phone lookahead and pruning speeds up the search. In language modeling, excluding low‐count statistics reduces memory (50% fewer bigrams and 92% fewer trigrams); with Wall Street Journal texts, excluding single‐occurrence bigrams and trigrams with counts less than five yields little performance decrease. In acoustic modeling, separate male and female right‐context VQ models and a bigram language model are used in the first pass, and right‐context continuous models and a trigram language model are used in the second pass. A shared‐distribution clustering uses a distortion measure based only on the weights of Gaussian mixtures in the HMM model. Testing the system with a 5000‐word vocabulary, the word inclusion rate (i.e., correct word retained in the first...

Proceedings of the …, 2001
In this paper, the first results from a longitudinal study of speech production in French are pre... more In this paper, the first results from a longitudinal study of speech production in French are presented for 9 patients who underwent vocal tract surgery, including partial or total glossectomy or pelvi-glossectomy and tongue reconstruction, or partial mandibulectomy. For each patient, two recordings were made of the acoustic speech signal, the first one a few days before surgery, and the second between 3 weeks and 12 months after surgery. The analysis of the data aimed at two main objectives. First we wanted to quantitatively assess the degree of speech production impairment induced by the surgery and to propose explanations in terms of articulation. Second, we were interested to observe and to understand how some of the subjects with a strong impairment were able to modify their speech production control strategies, in order to deal with the dramatic modifications of their oral cavity. Data analysis was based on formant patterns for the vowels, on Lisker and Abramson's VOT, and on temporal and spectral properties of the burst for the consonants. Speech production after surgery was evaluated on the basis of a comparison with the pre-surgery recording, which was considered as a reference describing the patients' speech production under normal condition. Special attention was paid to the patients who showed the largest postsurgery difficulties, in order to find the origins of their impairment and to study the strategy adopted to deal with this impairment.

Opening the New Technologies of Information and Communication to the disabled people is a questio... more Opening the New Technologies of Information and Communication to the disabled people is a question of increasing interest nowadays. The TELMA project aims at developing software and hardware bricks for a telecommunication terminal (cellular phone) for hearing impaired users. This terminal will be augmented with original audiovisual functionalities. More specifically, the TELMA terminal will exploit the visual modality of speech in two main tasks. On the one hand, visual speech information is used to improve speech enhancement techniques in adverse environment (environmental noise reduction enables the hearing-impaired to better exploit his/her residual acoustic abilities). On the other hand, the terminal will provide analysis/synthesis of lip movements and Cued Speech gestures. The Cued Speech is a face-to-face communication method used by a part of the oralist hearing-impaired community. It is based on the association of lip shapes with cues formed by the hand at specific locations. The TELMA terminal will translate lipreading + Cued Spech towards acoustic speech, and vice-versa, so that hearing-impaired people can communicate between them and with normal hearing people through telephone networks. To associate scientific developments, economic perspectives and efficient integration of disabled people concerns, the project is build on a partnership between universities (INPG and ENST), industrial/service company (France Télécom, R&D division) and potential users from the hearing-impaired community, under the supervision of health professionals (Grenoble Hospital Center / ORL).
Labeling audio-visual speech corpora and training an ANN/HMM audio-visual speech recognition system
6th International Conference on Spoken Language Processing (ICSLP 2000)
We present a method to label an audio-visual database andto setup a system for audio-visual speec... more We present a method to label an audio-visual database andto setup a system for audio-visual speech recognition basedon a hybrid Artificial Neural Network/Hidden Markov Model(ANN/HMM) approach.The multi-stage labeling process is presented on a new audiovisualdatabase recorded at the Institute de la CommunicationParlee (ICP). The database was generated via transposition of theaudio database NUMBERS95. For the labeling first a large subsetof

7th European Conference on Speech Communication and Technology (Eurospeech 2001)
Acoustic data are presented from a prosodic database containing data from 3 French speakers. The ... more Acoustic data are presented from a prosodic database containing data from 3 French speakers. The prosodic boundaries examined are the Utterance, the Intonational Phrase, the Accentual Phrase, and the Word. The aim is to study the interaction of coarticulatory effects with prosodic effects. The vowel /a/ before the prosodic boundary and the consonants /b d I f s 5/ after the prosodic boundary are examined. It is found that the vowel duration is greatly affected by the strength of the prosodic boundary, but consonant duration less so. The duration of the fricative consonants is more stable than the stop consonants. Formant values suggest that /a/ is lower and more back the stronger the prosodic boundary, and that the vowel is more likely to reach its low target following a bilabial consonant /b f/. Based on an examination of formant values, the velar stop /I/ appears to have much variability in the front-back dimension. Finally, there is a strong negative correlation between duration and mean velocity of the formant transition, and this effect is strongly related to the strength of the prosodic boundary.
Compensating for labial perturbation in a rounded vowel: an acoustic and articulatory study
3rd European Conference on Speech Communication and Technology (Eurospeech 1993)
How can the control of the vocal tract limit the speaker's capability to produce the ultimate perceptive objectives of speech? 1063
5th European Conference on Speech Communication and Technology (Eurospeech 1997)
Page 1. 1 How can the control of the vocal tract limit the speaker's capability to produce t... more Page 1. 1 How can the control of the vocal tract limit the speaker's capability to produce the ultimate perceptive objectives of speech ? Christophe Savariaux, Louis-Jean Boë & Pascal Perrier Institut de la Communication Parlée ...

7th International Conference on Spoken Language Processing (ICSLP 2002)
Encouraged by the good performance of the DCT in audiovisual speech recognition [1], we investiga... more Encouraged by the good performance of the DCT in audiovisual speech recognition [1], we investigate how the selection of the DCT features influences the recognition scores in a hybrid ANN/HMM audiovisual speech recognition system on a continuous word recognition task with a vocabulary of 30 numbers. Three sets of features, based on the mean energy, the variance and the variance relative to the mean value, were chosen. The performance of these features is evaluated in a video only and an audiovisual recognition scenario with varying Signal to Noise Ratios (SNR). The audiovisual tests are performed with 5 types of additional noise at 12 SNR values each. Furthermore the results of the DCT based recognition are compared to those obtained via chroma-keyed geometric lip features [2]. In order to achieve this comparison, a second audiovisual database without chroma-key has been recorded. This database has similar content but a different speaker.

Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 1 : JEP, 2016
La parole est souvent décrite comme une mise en séquence d'unités associant des représentations l... more La parole est souvent décrite comme une mise en séquence d'unités associant des représentations linguistiques, sensorielles et motrices. Le lien entre ces représentations se fait-il de manière privilégiée sur une unité spécifique ? Par exemple, est-ce la syllabe ou le mot ? Dans cette étude, nous voulons contraster ces deux hypothèses. Pour cela, nous avons modifié chez des locuteurs du français la production de la syllabe « bé », selon un paradigme d'adaptation auditori-motrice, consistant à perturber le retour auditif. Nous avons étudié comment cette modification se transfère ensuite à la production du mot « bébé ». Les résultats suggèrent un lien entre représentations linguistiques et motrices à plusieurs niveaux, à la fois celui du mot et de la syllabe. Ils montrent également une influence de la position de la syllabe dans le mot sur le transfert, qui soulève de nouvelles questions sur le contrôle sériel de la parole.
Le Centre pour la Communication Scientifique Directe - HAL - Grenoble Ecole de Management, Jun 8, 2020
Les consonnes plosives sont parmi les phonèmes les plus représentés dans l'inventaire phonologiqu... more Les consonnes plosives sont parmi les phonèmes les plus représentés dans l'inventaire phonologique des langues du monde. Outre leur rôle linguistique, elles remplissent également un rôle paralinguistique dans la pratique instrumentale et vocale, notamment au sein de la pratique vocale du Human Beatbox. Cet article apporte un éclairage sur les similitudes et différences dans la dynamique articulatoire de trois consonnes plosives du français et des sons percussifs correspondants du Human Beatbox. Si ces deux modes de production vocale ont une racine commune, une dynamique articulatoire différente est mise en évidence pour le Human Beatbox. Nous retrouvons des indices d'un mécanisme éjectif, qui a un impact sur la dynamique linguale.

Reviewed by
upmf-grenoble.fr We all go through a process of perceptual narrowing for phoneme identification. ... more upmf-grenoble.fr We all go through a process of perceptual narrowing for phoneme identification. As we become experts in the languages we hear in our environment we lose the ability to identify phonemes that do not exist in our native phonological inventory. This research examined how linguistic experience—i.e., the exposure to a double phonological code during childhood—affects the visual processes involved in non-native phoneme identification in audiovisual speech perception. We conducted a phoneme identification experiment with bilingual and monolingual adult participants. It was an ABX task involving a Bengali dental-retroflex contrast that does not exist in any of the participants ’ languages. The phonemes were presented in audiovisual (AV) and audio-only (A) conditions. The results revealed that in the audio-only condition monolinguals and bilinguals had difficulties in discriminating the retroflex non-native phoneme. They were phonologically “deaf” and assimilated it to the d...

The study aims to better understand the origin of increased tapping variability and inaccuracy in... more The study aims to better understand the origin of increased tapping variability and inaccuracy in people who stutter during paced and un-paced tapping. The overall question is to what extent these timing difficulties are related to a central clock deficit, a deficit in motor execution, or both.Finger tapping behavior of 16 adults who stutter (PWS) with different levels of musical training was compared with performance of 16 matching controls (PNS) in three finger tapping synchronization tasks ― a simple 1:1 isochronous pattern, a complex non-isochronous pattern, and a 4 tap:1 beat isochronous pattern ―, a continuation task (without external stimulation), and a reaction task involving aperiodic and unpredictable patterns. The results show that PWS exhibited larger negative asynchrony (expressed as phase angles), and increased synchronization variability (expressed as phase locking values) in paced tapping tasks, and that these differences from the PNS group were modulated by rhythmic...

Stuttering is characterized by respiratory, laryngeal and articulatory peculiarities, especially ... more Stuttering is characterized by respiratory, laryngeal and articulatory peculiarities, especially when the to-be-produced speech is complex. This study examined the glottal behaviour in people who stutter (PWS) during production of simple bilabial (/p/, /b/, /m/) and complex (/pR/, /bR/) onsets. It was hypothesized that the glottal behavior of PWS presented idiosyncrasies, compared to people who do not stutter (PNS) and that these were modulated by the complexity of the onset. Producing semi-spontaneous speech with embedded target words, acoustic and EGG data were collected from 4 PWS and 4 PNS. From the perceptually fluent productions, duration of bilabial occlusion, intensity, open quotient (OQ), difference in intensity, pitch and laryngeal OQ between occlusion-phase and following vowel were measured. No significant differences in glottal behavior were found between PWS and PNS. However, compared to PNS, PWS devoiced voiced consonants significantly more, which motivates a larger-sc...
Le Human Beatbox : un langage musical au défi des limites physiologiques humaines
International audienc

Evaluation fonctionnelle des reconstructions endobuccales: Un outil intéressant : l'interprétation articulatoire du signal acoustique
Introduction. Evaluer la qualite fonctionnelle des reconstructions apres exerese des cancers endo... more Introduction. Evaluer la qualite fonctionnelle des reconstructions apres exerese des cancers endobuccaux est certes, une necessite ethique, mais devient aussi un imperatif methodologique de progressiontechnique. Les tests actuellement proposes sont subjectifs (questionnaires de « qualite de vie ») ou astreignants et peu reproductibles (radio-cinema de la deglutition). Ce constat nous a conduit a appliquer une methode acoustique qui teste la capacite a decrire l'espace articulatoire maximal. Materiel et methode. Cette methode repose sur l'enregistrement d'un corpus de sons. Les signaux acoustiques sont ensuite numerises puis analyses spectralement et temporellement selon des techniques classiques de traitement du signal. La comparaison effectuee avec la base de donnee des sons sans distorsion permet d'evaluer l'importance du trouble, son siege et enfin la qualite de la reeducation. Resultats. L'analyse des enregistrements qui se fonde sur les la description ar...
The Verbotonal Method and the music to enhance French phonetics
International audienc
Uploads
Papers by Christophe Savariaux