Speech Recognition System

Surabhi Bansal

Outline

Speech Recognition System

Surabhi Bansal

2000

Abstract

Speech recognition applications are becoming more and more useful nowadays. Various interactive speech aware applications are available in the market. But they are usually meant for and executed on the traditional general-purpose computers. With growth in the needs for embedded computing and the demand for emerging embedded platforms, it is required that the speech recognition systems (SRS) are available on

FAQs

What limitations exist regarding the audio input format for PocketSphinx?add

PocketSphinx only accepts mono channel audio despite the PXA27x platform providing stereo input, leading to recognition failures. This limitation necessitated the use of pre-recorded audio files instead of live voice input.

How did noise interference impact the speech recognition system's performance?add

Background noise severely affects recognition accuracy, complicating phoneme identification. This issue was highlighted in application contexts like robot communication where environmental noise during operation poses challenges.

What challenges were faced during the integration of Mplayer with the speech recognition system?add

The integration faced issues with dependency on PERL scripts which were incompatible with PXA27x, and conflicts between audio channel settings hindered real-time functionality. These concerns led to reliance on batch mode processing.

What evidence supports the choice of PocketSphinx for embedded systems?add

PocketSphinx demonstrated superior speed and functionality on embedded platforms, recognized both digits and words effectively. It is the first open-source system capable of real-time speech recognition for medium vocabulary, as stated in the project findings.

What are the main goals for future enhancements of the speech recognition system?add

Future work aims to optimize the PocketSphinx dictionary, enable real-time processing for live mode, and enhance adaptability across different audio channels. Addressing these limitations could significantly improve recognition capabilities.

Figures (2)

Speech recognition basically means talking to a computer, having it recognize what w ire saying, and lastly, doing this in real time. This process fundamentally functions as a ipeline that converts PCM (Pulse Code Modulation) digital audio from a sound card into ecognized speech. The elements of the pipeline are: > Transform the PCM digital audio into a better acoustic representation — The input to speech recognizer is in the form of a stream of amplitudes, sampled at about 16,000 times per second. But audio in this form is not useful for the recognizer. Hence, Fast-Fourier transformations are used to produce graphs of frequency components describing the sound heard for 1/100" of a second. Any sound is then identified by matching it to its closest entry in the database of such graphs, producing a number, called the “feature number” that describes the sound.

The target hardware platform for this work is PXA27X mainstone board. It serves as a prototype for handheld devices. The mainstone board is 208 MHz Intel PXA27X processor, with 64MB of SDRAM, 32MB of flash memory and a quarter-VGA color LCD screen. We chose this particular device because it runs the GNU/Linux R_ operating system, simplifying the initial port of our system. To build our system, a GCC 3.4.3 cross-compiler is used as it is built with the crosstool script. Let us describe each component of our system.

References (12)

Kai-Fu Lee, Hsiao-Wuen Hon, and Raj Reddy, An Overview of the SPHINX Speech Recognition System. IEEE Transactions on Acoustics, Speech and Signal Processing,
Pellom, B., Sonic: The University of Colorado Continuous Speech Recognition System.
Willie Walker, Paul Lamere, Philip Kwok, Bhiksha Raj, Rita Singh, Evandro Gouvea, Peter Wolf, Joe Woelfel, Sphinx-4: A Flexible Open Source Framework for Speech Recognition.
A. Hagen, D. A. Connors, B. L. Pellom, The Analysis and Design of Architecture Systems for Speech Recognition on Modern Handheld-Computing Devices.
David Huggins-Daines, Mohit Kumar, Arthur Chan, Alan W Black, Mosur Ravishankar, and Alex I. Rudnicky, PocketSphinx: A Free, Real-Time Continuous Speech Recognition System for handheld devices.
Ben Shneiderman, The Limits of Speech Recognition.
Stefan Eickeler, K. Biatov, Martha Larson, J. Kohler, Two Novel Applications of Speech Recognition Methods for Robust Spoken Document Retrieval.
Lawrence R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.
Yoshitaka Nishimura, Mikio Nakano, Kazuhiro Nakadai, Speech Recognition for a Robot under its Motor Noises by Selective Application of Missing Feature Theory and MLLR.
Naveen Srinivasamurthy, Antonio Ortega, Shrikanth Narayanan, Efficient Scalable Speech Compression for Scalable Speech Recognition.
Brian Delaney, Tajana Simunic, Nikil Jayant, Energy Aware Distributed Speech Recognition for Wireless Mobile Devices.
Lawrence R. Rabiner, Applications of Speech Recognition in the Area of Telecommunications.

Speech Recognition System

Sign up for access to the world's latest research

Abstract

FAQs

References (12)

Related topics