Papers by Jean-Philippe Goldman

A continuous prominence score based on acoustic features
Interspeech 2012
ABSTRACT Up to now, prominence detection has mainly been considered a binary matter, a syllable o... more ABSTRACT Up to now, prominence detection has mainly been considered a binary matter, a syllable or a word being considered as prosodically prominent or not. This contribution aims at developing an automatic detection procedure of gradual prominence. Based on 4 prosodic parameters (relative duration, relative f0, f0 movement and pause duration), the system provides each syllable with a gradual score of prominence ranging from 0 (non-prominent syllable) to 4 (extra-prominent syllable). The automatic detection (ProsoProm) relies on a manually annotated corpus (18 minutes, or 3669 syllables, of speech annotated by three experts) and is cumulative (the relative weight of each parameter is taken into account in order to compute a global score for each syllable). The discussion of the results includes a qualitative analysis of misses and false detections. The agreement between automatic and (median) human annotation reaches a Kappa score of 0.8.
Interspeech 2011
We provide a user-friendly automatic phonetic alignment tool for continuous speech, named EasyAli... more We provide a user-friendly automatic phonetic alignment tool for continuous speech, named EasyAlign. It is developed as a plug-in of Praat, the popular speech analysis software, and it is freely available. Its main advantage is that one can easily align speech from an orthographic transcription. It requires a few minor manual steps and the result is a multi-level annotation within a TextGrid composed of phonetic, syllabic, lexical and utterance tiers. Evaluation showed that the performances of this HTK-based aligner compare to human alignment and to other existing alignment tools. It was originally fully available for French, English. Community's interests for its extension to other languages helped to develop a straightforward methodology to add languages. While Spanish and Taiwan Min were recently added, other languages are under development.
Tendances prosodiques de la parole radiophonique
Cahiers de praxématique

7th International Conference on Speech Prosody 2014
This paper presents the results of a prosodic and phonostylistic analysis based on C-PhonoGenre, ... more This paper presents the results of a prosodic and phonostylistic analysis based on C-PhonoGenre, an 8-hour-long spoken French corpus, consisting of 9 speaking situations and (on average) 10 speakers per situation. The corpus was automatically segmented at the phonetic, syllabic and word levels (EasyAlign), and in larger pause-separated units. Part-ofspeech annotation (DisMo) and prominent syllable detection (ProsoProm) was added automatically. The corpus was also manually annotated at the syllabic level for stylistic variants, such as post-tonic schwas, liaisons, elisions, disfluencies, audible breaths and noises. Acoustic analyses (ProsoReport, DurationAnalyser) provide more than 100 micro-and macroprosodic measures, which we correlate with the phonostylistic features and the linguistic annotation. This analysis results in a contrastive, fine-grained prosometric description of phonostylistic and situational variation, over 4 situational, gradual dimensions: audience, media, preparation, and interactivity. Further statistical analysis was carried out to explore the discriminative and explanatory power of combinations of prosodic measures.
ProsoBox, a Praat Plugin for Analysing Prosody
10th International Conference on Speech Prosody 2020
Chapter 2. Orthographic and phonetic transcriptions of Rhapsodie recording
Studies in Corpus Linguistics

Speech Prosody 2016, 2016
Based on the facts that the voice quality that allows the recognition of a speaker is characteriz... more Based on the facts that the voice quality that allows the recognition of a speaker is characterized, among other features, by his/her fundamental frequency (F0) and that F0 may differ across languages, we investigated, in the present research, whether speakers show different F0 when they speak in two different languages. To do this, we carried out a study with a within-speaker design, in which long-term distributional (LTD) F0 level and span measures were examined in early or late bilingual speakers of English and French, of English and German, and of French and German. The results are the following: English-French speakers presented a lower F0 in English than in French. Along the same line, English-German speakers showed a lower F0 in English than in German. Moreover, they showed more variability in English than in German, especially when English was the speakers' mother tongue. Finally, French-German showed no differences in F0 level or span between both languages. These findings, which are partially in agreement with previous studies, not only highlight the advantage of using a within-speaker design in order to neutralize individual differences, but they also support the idea that the language spoken by the speaker is important for his/her identification.
To ascertain the merits of different phonetic syllabification algorithms, their performance was c... more To ascertain the merits of different phonetic syllabification algorithms, their performance was compared and contrasted both against each other, using lexical analysis, and against human syllable boundary placement, using first or second syllable repetition of a bisyllable non-word. Perception results show that second syllable repetition showed far greater consistency than that of the first suggesting that the former condition is a more accurate measure of boundary placement. Comparison of human and algorithm syllable boundary placement showed high categorial accuracy for the Dell and Laporte algorithms whilst suggesting the use of multiple concurrent algorithms to produce a measure of confidence for each syllable boundary judgement.
Transformations de la voix pour l'évaluation de systèmes de vérification du locuteur
ABSTRACT
Prosodic transcription of spoken corpora relies mainly on the identification of perceived promine... more Prosodic transcription of spoken corpora relies mainly on the identification of perceived prominence. However, the manual annotation of prominent phenomena is extremely time-consuming, and varies greatly from one expert to another. Automating this procedure would be of great importance. In this study, we present the first results of a methodology aiming at an automatic detection of prominence syllables. It is based on (i) a spontaneous French corpus that has been manually annotated according to a strict methodology and (ii) some acoustic prosodic parameters, shown to be corpus-independent, that are used to detect prominent syllables. Some automatic tools, used to handle large corpora, are also described.
La phonetisation de "plus", "tous" et de certains nombres.
Collocations Extraction Using a Syntactic Parser
Le phonostyle France-Info et ses ingrédients prosodico-phonétiques. Approche croisée homme-automate
EasyAlign for Brazilian Portuguese: a (semi) automatic segmental tool under Praat
Title: Proceedings of the VIIth GSCP International Conference

Variations in French Accentual Phrase realization. The case of penultimate marking
This paper aims at determining whether the presence of penultimate accentuation varies across the... more This paper aims at determining whether the presence of penultimate accentuation varies across the varieties of French, at examining acoustic correlates of penultimate prominence marking in a corpus-based perspective, and at discussing the status of such a phenomenon in French phonology. In French, the minimal prosodic unit for pitch accent assignment is not the Phonological Word, but a higher constituent in the prosodic hierarchy, the Accentual Phrase (AP). In the varieties of French spoken in the North of France, AP is tonally marked by a high pitch movement associated with its rightmost full syllable and an optional high tone associated with a syllable on its left (see the LHiLH* pattern identified by Jun & Fougeron [2002]). In the regional varieties of French spoken in the periphery of the Hexagon (Belgium and Switzerland); one variant of this pattern involves the realization of a prominence on the penultimate syllable of the AP. Little work has been dealing with such a prosodic ...
Une étude de la variation régionale de la vitesse d'articulation en français
Voice Transformations for the Evaluation of Speaker Verification Systems
Speech Recognition and Coding, 1995
This paper tries to compare the performance of two speaker verification systems with a new method... more This paper tries to compare the performance of two speaker verification systems with a new methodology of assessment. We first describe a succession of simple voice transformations that alter the fundamental frequency, the speech duration and the vocal tract length while ...
Chapitre 5 : La prosodie de quelques variétés de français parlées en Suisse romande
La variation prosodique régionale en français, 2012
ProsoReportDialog: a tool for temporal variables description in dialogues

This contribution describes an ongoing projects a smartphone application called Voice Äpp, which ... more This contribution describes an ongoing projects a smartphone application called Voice Äpp, which is a follow-up of a previous application called Dialäkt Äpp. The main purpose of both apps is to identify the user's Swiss German dialect on the basis of the dialectal variations of 15 words. The result is returned as one or more geographical points on a map. In Dialäkt Äpp, launched in 2013, the user provides his or her own pronunciation through buttons, while the Voice Äpp, currently in development, asks users to pronounce the word and uses speech recognition techniques to identify the variants and localize the user. This second app is more challenging from a technical point of view but nevertheless recovers the nature of dialect variation of spoken language. Besides, the Voice Äpp takes its users on a journey in which they explore the individuality of their own voices, answering questions such as: How high is my voice? How fast do I speak? Do I speak faster than users in the neighbouring city?
Uploads
Papers by Jean-Philippe Goldman