The goal of this article is to provide and present information about the training procedure SpinxTrain and its eligible modifications to get accurate and robust speech recognition in a mobile GSM environment. Some modifications are based...
moreThe goal of this article is to provide and present information about the training procedure SpinxTrain and its eligible modifications to get accurate and robust speech recognition in a mobile GSM environment. Some modifications are based on effective preprocessing of input data in combination with the optimal setting of the number of states per model, through the adjustment of the number of tied states or number of Gaussian mixtures. Another source of increased recognition rate is the 'optimal' setting of the speech decoder. As it is a non-linear, mathematically not well tractable task containing both real and integer values, methods of evolution strategies can be successfully used (an 18.6% improvement in WER was observed compared to the original setting). All experiments and results were obtained for the Slovak speech database Mobildat, which contains recordings of 1100 speakers. The Sphinx4 recognition system was used for evaluation of the trained model. Biographical notes: Juraj Vojtko, born in 1981, received MSc in telecommunications from the Slovak University of Technology in Bratislava, Slovakia in 2005. Since 2009 he has occupied an assistant professor position at the Institute of Telecommunications at the Faculty of Electrical Engineering and Information Technology of the Slovak University of Technology in Bratislava. He has also worked as developer in commercial segment in the area of communication and information systems since 2002. The field of his research focused on speech processing specifically speech recognition and speaker identification and verification. Juraj Kačur, born in Bratislava in 1976. Master of Science degree obtained in the year 2000 and PhD in 2005 at the Faculty of Electrical Engineering and Information Technology of the Slovak university of technology (FEI STU) Bratislava. Since 2001, he occupies an assistant professor position at the institute of telecommunication at FEI STU Bratislava. Between years 2000 and 2001, he was with the Slovak Academy of Science, department of speech analysis and synthesis where he participated on several projects. The field of his research activities includes: signal processing, speech processing, speech recognition, speech detection, speaker identification, High order statistic, Wavelet transform, machine learning, ANN and HMM. Gregor Rozinaj (M'97) received MSc and PhD in telecommunications from Slovak University of Technology, Bratislava, Slovakia in 1981 and 1990, respectively. He has been a lecturer at the Institute of Telecommunications of the Slovak University of Technology since 1981. From 1992-1994, he worked on the research project devoted to speech recognition at Alcatel Research Center in Stuttgart, Germany. From 1994-1996, he was employed as a researcher at the University of Stuttgart, Germany working on a research project for automatic ship control. Since 1997, he has been a Head of the DSP group at the Institute of Telecommunications of the Slovak University of Technology, Bratislava. Since 1998, he has been an Associate Professor at the same institute. He is an author of 3 US and European patents on digital speech recognition and 1 Czechoslovak patent on fast algorithms for DSP.