Speech Recognition System
2000
Abstract
Speech recognition applications are becoming more and more useful nowadays. Various interactive speech aware applications are available in the market. But they are usually meant for and executed on the traditional general-purpose computers. With growth in the needs for embedded computing and the demand for emerging embedded platforms, it is required that the speech recognition systems (SRS) are available on
FAQs
AI
What limitations exist regarding the audio input format for PocketSphinx?
PocketSphinx only accepts mono channel audio despite the PXA27x platform providing stereo input, leading to recognition failures. This limitation necessitated the use of pre-recorded audio files instead of live voice input.
How did noise interference impact the speech recognition system's performance?
Background noise severely affects recognition accuracy, complicating phoneme identification. This issue was highlighted in application contexts like robot communication where environmental noise during operation poses challenges.
What challenges were faced during the integration of Mplayer with the speech recognition system?
The integration faced issues with dependency on PERL scripts which were incompatible with PXA27x, and conflicts between audio channel settings hindered real-time functionality. These concerns led to reliance on batch mode processing.
What evidence supports the choice of PocketSphinx for embedded systems?
PocketSphinx demonstrated superior speed and functionality on embedded platforms, recognized both digits and words effectively. It is the first open-source system capable of real-time speech recognition for medium vocabulary, as stated in the project findings.
What are the main goals for future enhancements of the speech recognition system?
Future work aims to optimize the PocketSphinx dictionary, enable real-time processing for live mode, and enhance adaptability across different audio channels. Addressing these limitations could significantly improve recognition capabilities.
References (12)
- Kai-Fu Lee, Hsiao-Wuen Hon, and Raj Reddy, An Overview of the SPHINX Speech Recognition System. IEEE Transactions on Acoustics, Speech and Signal Processing,
- Pellom, B., Sonic: The University of Colorado Continuous Speech Recognition System.
- Willie Walker, Paul Lamere, Philip Kwok, Bhiksha Raj, Rita Singh, Evandro Gouvea, Peter Wolf, Joe Woelfel, Sphinx-4: A Flexible Open Source Framework for Speech Recognition.
- A. Hagen, D. A. Connors, B. L. Pellom, The Analysis and Design of Architecture Systems for Speech Recognition on Modern Handheld-Computing Devices.
- David Huggins-Daines, Mohit Kumar, Arthur Chan, Alan W Black, Mosur Ravishankar, and Alex I. Rudnicky, PocketSphinx: A Free, Real-Time Continuous Speech Recognition System for handheld devices.
- Ben Shneiderman, The Limits of Speech Recognition.
- Stefan Eickeler, K. Biatov, Martha Larson, J. Kohler, Two Novel Applications of Speech Recognition Methods for Robust Spoken Document Retrieval.
- Lawrence R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.
- Yoshitaka Nishimura, Mikio Nakano, Kazuhiro Nakadai, Speech Recognition for a Robot under its Motor Noises by Selective Application of Missing Feature Theory and MLLR.
- Naveen Srinivasamurthy, Antonio Ortega, Shrikanth Narayanan, Efficient Scalable Speech Compression for Scalable Speech Recognition.
- Brian Delaney, Tajana Simunic, Nikil Jayant, Energy Aware Distributed Speech Recognition for Wireless Mobile Devices.
- Lawrence R. Rabiner, Applications of Speech Recognition in the Area of Telecommunications.