Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.

Log In
Sign Up

Figure 4 – uploaded by Hassan Satori

See full PDF downloadDownload figure

Experiments with s presented that the pe peech recognition in noisy conditions rformance degradation was observed if recognition was tested at 5 dB and the recognition rate was hardly affected if SN R exceeded 25 dB for both noisy kinds. However, major degradation of accuracy was observed if the speech signal was distorted with noise and the SNR exceeded 35 dB. Int his investigation, we found that the dig- its which include the S alphabet are affected more than oth- ers digits for different SNR values. In future, we will try to improve the performance of this ASR system based on the combined HMMs and Deep learning techniques. 1,67% in the noisy environment at SNR 15 dB, 25 dB, 35 dB and 45 dB, respectively. The confusion of Krad with Kuz gradually decreases with the increase of noise level. The similar situation also happened with the all used digits. Fur- ther, lower rates were observed for Sin, Smmus, Sdes and Sa where these digits have got the accuracies lower than the other with all used SNR where the noisy influence was clearly observed with 25 dB. Figure 5 shows the system recognition rates for grinder noisy speech with some SNR values used in the first experiment. The high accuracy has got from the Krad digit and Amya, Kuz, Tam and Tza digits maintain the recognition rates more than 70% while the oth- ers digits reach accuracy below 70% up to SNR 5 dB. For SNR 15 dB and more the recognition decreases again for all digits. The studied digits have got a lower accuracy from 25 dB and a very low accuracy was achieved at 35 dB. More- over, we noted that the digits which contain the S alphabets are not recognized at 35 dB and these digits possess a very high dissimilarity compared to all other spoken digits. For the most resisted digit is Krad, due to his included strong consonants and number of syllables. — Figure 5 Experiments with s presented that the pe peech recognition in noisy conditions rformance degradation was observed if recognition was tested at 5 dB and the recognition rate was hardly affected if SN R exceeded 25 dB for both noisy kinds. However, major degradation of accuracy was observed if the speech signal was distorted with noise and the SNR exceeded 35 dB. Int his investigation, we found that the dig- its which include the S alphabet are affected more than oth- ers digits for different SNR values. In future, we will try to improve the performance of this ASR system based on the combined HMMs and Deep learning techniques. 1,67% in the noisy environment at SNR 15 dB, 25 dB, 35 dB and 45 dB, respectively. The confusion of Krad with Kuz gradually decreases with the increase of noise level. The similar situation also happened with the all used digits. Fur- ther, lower rates were observed for Sin, Smmus, Sdes and Sa where these digits have got the accuracies lower than the other with all used SNR where the noisy influence was clearly observed with 25 dB. Figure 5 shows the system recognition rates for grinder noisy speech with some SNR values used in the first experiment. The high accuracy has got from the Krad digit and Amya, Kuz, Tam and Tza digits maintain the recognition rates more than 70% while the oth- ers digits reach accuracy below 70% up to SNR 5 dB. For SNR 15 dB and more the recognition decreases again for all digits. The studied digits have got a lower accuracy from 25 dB and a very low accuracy was achieved at 35 dB. More- over, we noted that the digits which contain the S alphabets are not recognized at 35 dB and these digits possess a very high dissimilarity compared to all other spoken digits. For the most resisted digit is Krad, due to his included strong consonants and number of syllables.

Related Figures (5)

‘ig. 1 Architecture of an automatic speech recognition system

Fig. 2. Hidden Markov Model (HMM)—5S-states

Table 1 System parameters saved into one “.wav” file and sometimes up to four “.wav” files depending on number of sessions the speaker spent to finish recording. It is time consuming to save every single recording once uttered. Hence, the corpus consists of 10 repetitions of every digit produced by each speaker. Depend- ing on this, the corpus consists of 4000 tokens. During the recording session, the waveform for each utterance was visualized back to ensure that the entire word was included in the recorded signal. Therefore, there was a need to seg- ment manually these bigger “.wav” files into smaller ones each having a single recording of a single word and manual classification of those “.wav” files into the corresponding directories was done. Wrongly pronounced utterances were ignored and only correct utterances are kept in the database. The software used to the voice with speakers in wavesurfer (Hamidi et al. 2018).

Table 2 Overall recognition rates

Fig.3 a Spectrogram of the Kuz digit in normal environment. b Spectrogram of the kuz digit at 25 SNR under-car noise. ¢ Spectro- gram of the kuz digit at 25 SNR under grinder noise

Related topics:

Cognitive Science Computer Science Artificial Intelligence Speech Technology

Connect with 287M+ leading minds in your field

Discover breakthrough research and expand your academic network

Explore
Papers
Topics

Features
Mentions
Analytics
PDF Packages
Advanced Search
Search Alerts

Journals
Academia.edu Journals
My submissions
Reviewer Hub
Why publish with us
Testimonials

Company
About
Careers
Press
Help Center
Terms
Privacy
Copyright
Content Policy

580 California St., Suite 400

San Francisco, CA, 94104

© 2025 Academia. All rights reserved