Building a Rich Arabic Speech Database

mansour  alsulaiman

doi:10.1109/AMS.2011.29

Outline

Building a Rich Arabic Speech Database

mansour alsulaiman

2011

https://doi.org/10.1109/AMS.2011.29

visibility

…

description

6 pages

link

1 file

Abstract

Availability of databases is a necessity in the speech processing field. The publically available databases in Arabic language are few. In this paper we describe a rich database for Arabic language. The database is rich in many dimensions: in text, environments, microphone type, number of recording sessions, recording system, the transmission channel, the country of origin, and the mother language. This richness makes the database an important resource for research in Arabic Language processing and very useful in many speech processing tasks, such as speaker recognition, speech recognition, and accent identification. The speakers were speaking in Modern Standard Arabic (MSA).

FAQs

What are the key features of the proposed Arabic speech database?add

The database is rich in text materials with 940 sentences, 700 words, and extensive speaker variability including 240 speakers. It encompasses recordings from multiple environments, such as soundproof rooms, offices, and restaurants.

How does the speaker selection impact database richness?add

The database includes diverse speakers, comprising 137 Saudi males and 103 non-Saudi males from 27 nationalities, enhancing the generalizability of research findings. This diversity aids in examining linguistic variances across different Arabic accents.

What methodologies were used for recording the speech data?add

Recording occurred in three sessions, utilizing high-quality and medium-quality microphones in multiple environments to assess varied acoustic influences. The average recording times per speaker were 19, 18, and 16 minutes for the office, soundproof, and cafeteria environments, respectively.

How is phonetic balancing achieved in the database?add

The text corpus includes randomization with 20 iterations to ensure that sub-lists contain all phonemes and phonetic diversity, aimed at balanced representation. Optimal subsets were selected based on consistent presence of phonemics across recordings.

What practical implications does the database have for Arabic speech technology?add

The database can significantly enhance Arabic speech recognition systems by providing a well-validated corpus for training and testing, particularly for the effects of noise and recording conditions. It supports research in recognizing variances from different Arabic dialects and recording setups.

Figures (2)

Figure 1. Microphones and mobile phone setup in office environment.

Figure 2. Setup of text reading from the screen kept logs to make sure that every speaker recorded the three environments.

References (17)

M. R. Sambur, "Selection of acoustic features for speaker identification," IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 23, pp. 176-182, 1975.
J. J. Wolf, "Efficient acoustic parameters for speaker recognition," Journal of the Acoustical Society of America, Vol. 51, pp. 2030-2043, 1972.
D. A. Reynolds, "An overview of automatic speaker recognition Technology," Proc. IEEE international conference on acoustics, speech and signal processing, ICASSP'02, Vol. IV, pp. 4072-4075, May 2002.
G. R. Doddington, M. A. Przybocki, A. F. Martin, and D. A. Reynolds, "The NIST speaker recognition evaluation overview: methodology systems, results, perspective," Speech Communications, 31, pp. 225-254, 2000.
L. G. Kersta, "Voiceprint classification for an extended population," Journal of the Acoustical Society of America (A), Vol. 39, pp. 1239, 1966.
D. A. Reynolds, "The effects of handset variability on speaker recognition performance: Experiment on the Switchboard corpus," Proc. IEEE international conference on acoustics, speech, and signal processing, ICASSP'96 pp. 113-116, May 1996.
R. Norton, "The evolving biometric marketplace to 2006," Biometric Technology Today, 10(9), pp. 7-8, 2002.
The NIST Year 2010 Speaker Recognition Evaluation Plan, available at http://www.itl.nist.gov/iad/mig/tests/sre/2010/NIST_SRE1 0_evalplan.r6.pdf
John S. Garofolo, et al., "TIMIT Acoustic-Phonetic Continuous Speech Corpus," Linguistic Data Consortium, Philadelphia, 1993.
D. Graff, K. Walker, and A. Canavan, "Switchboard-2 Phase II," Linguistic Data Consortium, Philadelphia, 1999.
J. O. Garcia, J. G. Rodriguez, and V. M. Aguair, "AHUMADA: A large speech corpus is Spanish for speaker characterization and identification," Speech Communication, Vol. 31, pp. 255-264, 2000.
D. Petrovska, "POLYCOST: a telephone speech database for speaker recognition," RLA2C, Avignon, France, pp. 211-214, 20-23 April 1998.
F. Mihelic, J. Gros, S. Dobrisek, J. Zibert, and N. Pavesic, ''Spoken Language Resources at LUKS of the University of Ljubljana,'' International Journal of Speech Technology, Vol. 6, pp. 221---232, 2003.
M. Alghamdi, et al., ''Saudi accented Arabic voice bank,'' J. King Saud University, CIS, pp. 1-15, 2007.
J. Makhoul, et al., ''2005 BBN/AUB DARPA Babylon Levantine Arabic Speech and Transcripts,'' Linguistic Data Consortium, Philadelphia, 2005.
M. Boudraa and B. Boudraa, ''Twenty Lists of Ten Arabic Sentences for Assessment,'' Acustica. Acta Acustica, Vol. 86, pp. 870---882, 2000.
D. R. Ghania, S. A. Selouani, and M. Boudraa, ''Algerian Arabic speech database (ALGASD): corpus design and automatic speech Recognition application,'' The Arabian Journal for Science and Engineering, Volume 35, Number 2C, pp. 157-167, December 2010.

Building a Rich Arabic Speech Database

Sign up for access to the world's latest research

Abstract

FAQs

Related papers

References (17)

Related papers

Related topics