Papers by Ricardo Córdoba
We present the analysis performed over eleven speakers (five women and 6 men) in order to obtain ... more We present the analysis performed over eleven speakers (five women and 6 men) in order to obtain the most important parameters as far as speaker identity is concerned. Parameters that have been studied are F0, six formants, five bandwidths and four source parameters. Feature selection is based on linear discriminant analysis. Results show that the most relevant parameter is F0, followed by formant frequencies and open quotient.

Abstract- In this paper, we present advanced diagnosis and feedback tools to improve student soft... more Abstract- In this paper, we present advanced diagnosis and feedback tools to improve student software quality. After several years of quantitative analysis of the relationship between the assigned grades and certain software features, we have been able to characterize highquality assembly software. With these results, we have defined new learning objectives after an instructors ' consensus, and we have developed a set of automatic tools that help to supervise how well the objectives have been achieved and to feed this information back to the students along the course. We have successfully used these analysis tools in a new course, with a considerable improvement in software quality factors. In the 2003-2004 academic year, there were 54,7% more subroutines per program, with 48,7 % fewer lines per subroutine and an increase of 43,6 % in the use of the more complex addressing capabilities. This improvement in quality had a positive impact on students ’ surveys. Index Terms- automa...
Se presenta un sistema de comprensión de comunicaciones habladas en dos idiomas, castellano e ing... more Se presenta un sistema de comprensión de comunicaciones habladas en dos idiomas, castellano e inglés, para el control de tráfico aéreo. Se emplea una arquitectura con dos reconocedores en paralelo más un módulo de detección de idioma. La salida del reconocedor en el idioma elegido pasa al sistema de comprensión basado en reglas dependientes de contexto que extrae los conceptos clave. Palabras clave: Reconocimiento, multi-idioma, modelado de lenguaje, comprensión, control aéreo.
New teaching methodology for electronics and its adaptation to the European space for higher education
2011 Promotion and Innovation with New Technologies in Engineering Education (FINTDI 2011), 2011

Resumen. Este artículo presenta la incorporación de un sistema de diálogo hablado a un robot autó... more Resumen. Este artículo presenta la incorporación de un sistema de diálogo hablado a un robot autónomo, concebido como elemento interactivo en un museo de ciencias capaz de realizar visitas guiadas y establecer diálogos sencillos con los visitantes del mismo. Para hacer más atractivo su funcionamiento, se ha dotado al robot de rasgos (como expresividad gestual o síntesis de voz con emociones) que humanizan sus intervenciones. El reconocedor de voz es un subsistema independiente del locutor (permite reconocer el habla de cualquier persona), que incorpora medidas de confianza para mejorar las prestaciones del reconocimiento, puesto que se logra un filtrado muy importante de habla parásita. En cuanto al sistema de comprensión, hace uso de un sistema de aprendizaje basado en reglas, lo que le permite inferir información explícita de un conjunto de ejemplos, sin que sea necesario generar previamente una gramática o un conjunto de reglas que guíen al módulo de comprensión. Estos subsistemas se han evaluado previamente en una tarea de control por voz de un equipo HIFI, empleando nuestro robot como elemento de interfaz, obteniendo valores de 95,9% de palabras correctamente reconocidas y 92,8% de conceptos reconocidos. En cuanto al sistema de conversión de texto a voz, se ha implementado un conjunto de modificaciones segmentales y prosódicas sobre una voz neutra, que conducen a la generación de emociones en la voz sintetizada por el robot, tales como alegría, enfado, tristeza o sorpresa. La fiabilidad de estas emociones se ha medido con varios experimentos perceptuales que arrojan resultados de identificación superiores al 70% para la mayoría de las emociones, (87% en tristeza, 79,1% en sorpresa). Palabras clave: reconocimiento de habla, medidas de confianza, síntesis de voz con emociones.

8th European Conference on Speech Communication and Technology (Eurospeech 2003)
In this paper we present a revision and evaluation of some of the main methods used in variable f... more In this paper we present a revision and evaluation of some of the main methods used in variable frame rate (VFR) analysis, applied to speech recognition systems. The work found in the literature in this area usually deals with restricted conditions and scenarios and we have revisited the main algorithmic alternatives and evaluated them under the same experimental framework, so that we have been able to establish objective considerations for each of them, selecting the most adequate strategy. We also show till what extent VFR analysis is useful in its three main application scenarios, namely "reduction of computational load", "improve acoustic modelling" and "handling additive noise conditions in the time domain". From our evaluation on a difficult telephone large vocabulary task, we establish that VFR analysis does not significantly improve the results obtained using the traditional fixed frame rate analysis (FFR), except when additive noise is present in the database and specially for low SNRs.
6th International Conference on Spoken Language Processing (ICSLP 2000)
The use of multiple acoustic models has reported great improvements when facing speaker independe... more The use of multiple acoustic models has reported great improvements when facing speaker independent difficult tasks. In this paper, we are applying this strategy to a flexible, large vocabulary, speaker-independent, isolated-word hypothesis generation system in a telephone environment with vocabularies up to 10000 words. The new problem addressed here is how to efficiently integrate the multiple model scheme in the system, as due to its bottom-up approach (phonetic string generation followed by a lexical access process), multiple possibilities arise (apart from the alternatives in the training stage), and its not clear what combination would achieve the best results. In the paper, full details on every alternative are shown, along with results showing actual improvements in the system.

4th European Conference on Speech Communication and Technology (Eurospeech 1995)
We present the development and characteristics of a basic ASR system for isolated digits in Spani... more We present the development and characteristics of a basic ASR system for isolated digits in Spanish, used over the telephone line. Initially we will introduce our first idea, a basic discrete system, and then we will see the improvements we made to increase the recognition rate at a low CPU cost (always considering its practical implementation as a real time system). The most remarkable advances were obtained with: 1) Semicontinuous modelling. It is a more precise modelling, although more time consuming. 2) End-pointing with a Neural network. 3) One pass decoding with noise models. The intention of both 2 and 3 is to alleviate the effects of a wrong end-pointing. 4) Parametrization using perceptual filters in frequency and filtering in the time domain (RASTA-PLP). We wanted to decrease the effect of telephonic noise in our system.

Conference of the International Speech Communication Association, 2000
In this paper, we propose an approach for Stress Assignment in Spanish Proper Names, based on a M... more In this paper, we propose an approach for Stress Assignment in Spanish Proper Names, based on a Multi-Layer Perceptron (MLP). When assigning stress to a word, we first analyse each vowel in the word and then calculate a Stress-Confidence Measure for it, using a MLP. The system will assign the stress to the vowel with the highest stress-confidence measure. In this paper we present and analyse different alternatives for the inputs to the Multi-Layer Perceptron. In all cases, we consider the number of vowels in the name and the vowel position in the word (taking into account only the vowels in the analysed word). For the rest of inputs, we consider a window of letters. These letters are obtained from the context of the vowel considered and from the word ending, in a similar way to [1]. We propose a Discrimination Measure to analyse the discrimination power for the different input configurations and we validate this measure and present the results obtained in each case. For the best configuration we obtain a 94.9% proper names correctly stressed (5.1% error rate). These results are compared to similar experiments using a Memory based learning approach (k-Nearest Neighbours).

Conference of the International Speech Communication Association, 1999
We propose the utilization of a new n-path binary tree search algorithm for vector quantization. ... more We propose the utilization of a new n-path binary tree search algorithm for vector quantization. Our target is to reduce the complexity (time processing) of the vector quantizer maintaining the quantization distortion. The algorithm has been applied to an isolated digit recognizer by telephone based on DHMM and to a continuous speech system based on SCHMM, so we will also give the recognition results for both of them. We have tested several alternatives to calculate the centroids of the higher levels of the tree. In all the experiments we have considered the following parameters for the evaluation: average distortion, same choice percentage, average distortion for the mistakes and processing time. Our reference has been the standard quantization (computing the distance with all centroids). In this reference case the distortion was 220.9 and the processing time was 2.1 seconds. With the npath binary tree search algorithm, we have obtained a 0.7 seconds processing time with a similar distortion: 226.4. In the semicontinuous system, we have obtained a reduction of 71 % in vector quantization processing time, maintaining the word accuracy.

2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings
It is well known that the emotional state of a speaker usually alters the way she/he speaks. Alth... more It is well known that the emotional state of a speaker usually alters the way she/he speaks. Although all the components of the voice can be affected by emotion in some statistically-significant way, not all these deviations from a neutral voice are identified by human listeners as conveying emotional information. In this paper we have carried out several perceptual and objective experiments that show the relevance of prosody and segmental spectrum in the characterization and identification of four emotions in Spanish. A Bayes classifier has been used in the objective emotion identification task. Emotion models were generated as the contribution of every emotion to the buildup of a Universal Background Emotion Codebook. According to our experiments, surprise is primarily identified by humans through its prosodic rubric (in spite of some automatically-identifiable segmental characteristics); while for anger the situation is just the opposite. Sadness and happiness need a combination of prosodic and segmental rubrics to be reliably identified.

Proceedings Frontiers in Education 35th Annual Conference
In this paper, we present advanced diagnosis and feedback tools to improve student software quali... more In this paper, we present advanced diagnosis and feedback tools to improve student software quality. After several years of quantitative analysis of the relationship between the assigned grades and certain software features, we have been able to characterize highquality assembly software. With these results, we have defined new learning objectives after an instructors' consensus, and we have developed a set of automatic tools that help to supervise how well the objectives have been achieved and to feed this information back to the students along the course. We have successfully used these analysis tools in a new course, with a considerable improvement in software quality factors. In the 2003-2004 academic year, there were 54,7% more subroutines per program, with 48,7% fewer lines per subroutine and an increase of 43,6% in the use of the more complex addressing capabilities. This improvement in quality had a positive impact on students' surveys.

Proceedings Frontiers in Education 35th Annual Conference
Project-Based Learning (PBL) is one of the most interesting instructional strategies in the field... more Project-Based Learning (PBL) is one of the most interesting instructional strategies in the field of technical careers. However it is especially complex and difficult to implement when applied to laboratory courses with a high student-to-faculty ratio. In this paper, we describe our sixyear experience of teaching laboratory courses in electronics in the Telecommunication Engineering Studies (a career that is a mixture of computer science and electronics engineering) after the adoption of the PBL philosophy, and the design and implementation of several strategies and tools to ease the administrative, teaching and learning tasks in these laboratories. Finally, we offer some evaluation results, showing that the adoption and combination of all our strategies and software tools actually works. Both the laboratory acceptance of the students (3.7 on a 1-5 scale) and students performance are high (they get 78.3% of the maximum grade, on average).
Este artículo describe una nueva técnica que permite combinar la información de dos sistemas fono... more Este artículo describe una nueva técnica que permite combinar la información de dos sistemas fonotácticos distintos con el objetivo de mejorar los resultados de un sistema de reconocimiento automático de idioma. El primer sistema se basa en la creación de cuentas de posteriorgramas utilizadas para la generación de i-vectores, y el segundo es una variante del primero que tiene en cuenta los n-gramas más discriminativos en función de su ocurrencia en un idioma frente a todos los demás. La técnica propuesta permite obtener una mejora relativa de 8.63% en C avg sobre los datos de evaluación utilizados para la competición ALBAYZIN 2012 LRE.
En este artículo, investigamos el funcionamiento de distintos tipos de rasgos acústicos en un sis... more En este artículo, investigamos el funcionamiento de distintos tipos de rasgos acústicos en un sistema de reconocimiento automático de habla (SRAH) en entorno telefónico. En concreto, exploramos dos alternativas distintas para el diseño del módulo parametrizador. En la primera de ellas, las características de dicho módulo son elegidas de forma empírica o basándose en conocimiento psicoacústico. En la segunda, dichas características son determinadas mediante la extracción discriminativa de rasgos que permiten una optimización conjunta del parametrizador y clasificador. Ambas estrategias han sido aplicadas a parametrizadores basados en la transformada ondicular dando lugar a mejoras significativas en la tasa de reconocimiento del sistema en comparación con las parametrizaciones convencionales basadas en la transformada de Fourier.

2009 Ieee International Conference on Acoustics, Speech, and Signal Processing, Vols 1- 8, Proceedings, 2009
Bayesian Networks, BNs, are suitable for mixed-initiative dialog modeling allowing a more flexibl... more Bayesian Networks, BNs, are suitable for mixed-initiative dialog modeling allowing a more flexible and natural spoken interaction. This solution can be applied to identify the intention of the user considering the concepts extracted from the last utterance and the dialog context. Subsequently, in order to make a correct decision regarding how the dialog should continue, unnecessary, missing, wrong, optional and required concepts have to be detected according to the inferred goals. This information is useful to properly drive the dialog prompting for missing concepts, clarifying for wrong concepts, ignoring unnecessary concepts and retrieving those required and optional. This paper presents a novel BNs approach where a single BN is obtained from N goal-specific BNs through a fusion process. The new fusion BN enables a single concept analysis which is more consistent with the whole dialog context.

Lecture Notes in Computer Science, 2009
The Web is changing the way people access & exchange information. Specifically in the teaching & ... more The Web is changing the way people access & exchange information. Specifically in the teaching & learning environment, we are witnessing that the traditional model of presence based magisterial classes is shifting towards Web Based Learning. This new model draws on remote access systems, knowledge sharing, and student mobility. In this context, pedagogical strategies are also changing, and for instance, Project-Based Learning (PBL) is seen as a potential driver for growth and development in this arena. This study is focused on a PBL oriented course with a Distributed Remote ACcess (DRAC) system. The objective is to analyze how quantitative methods can be leveraged to design and evaluate automatic diagnosis and feedback tools to assist students on qualityrelated pedagogical issues in DRAC enabled PBL courses. Main conclusions derived from this study are correlation-based and reveal that the development of automatic quality assessment and feedback requires further research.
IEEE Transactions on Audio, Speech, and Language Processing, 2012
Two new features have been proposed and used in the Rich Transcription Evaluation 2009 by the Uni... more Two new features have been proposed and used in the Rich Transcription Evaluation 2009 by the Universidad Politécnica de Madrid, which outperform the results of the baseline system. One of the features is the intensity channel contribution, a feature related to the location of the speaker. The second feature is the logarithm of the interpolated fundamental frequency. It is the first time that both features are applied to the clustering stage of múltiple distant microphone meetings diarization. It is shown that the inclusión of both features improves the baseline results by 15.36% and 16.71% relative to the development set and the RT 09 set, respectively. If we consider speaker errors only, the relative improvement is 23% and 32.83% on the development set and the RT09 set, respectively.
… de Lenguaje Natural, 2010

In the context of large vocabulary speech recognition systems it is of major importance to accura... more In the context of large vocabulary speech recognition systems it is of major importance to accurately model the allophonic variations to be faced in a real world task. Evaluation of which variants are actually improving the system performance is crucial, as it determines the acceptance of the pronunciation alternatives used. Traditional approaches in this direction use different criteria and, typically, evaluation only cares about the global impact of the augmented dictionaries in the error rate, so that this lead to little further insight on till what extent the proposed variations are actually working or not. Our proposal in this paper is also evaluating the marginal improvement due to every pronunciation variation used (initially restricted to rule-based variant generation), defining specific improvement metrics. We experimentally show how these metrics actually show the improvement achieved by the application of rules when dealing with certain pronunciations (or speakers in general) while their global impact in error rate may not be statistically significant.
Uploads
Papers by Ricardo Córdoba