Academia.eduAcademia.edu

Outline

Mobile phone identification using recorded speech signals

2014, 2014 19th International Conference on Digital Signal Processing

https://doi.org/10.1109/ICDSP.2014.6900732

Abstract

In this paper, we elaborate on mobile phone identification from recorded speech signals. The goal is to extract intrinsic traces related to the mobile phone used to record a speech signal. Mel frequency cepstral coefficients (MFCCs) are extracted from any recorded speech signal at a frame level. The sequences of the MFCC vectors extracted from each recording device train a Gaussian Mixture Model with diagonal covariance matrices. A Gaussian supervector is derived by concatenating the mean vectors and the main diagonals of the covariance matrices that is used as a template for each device. Experiments were conducted on a database of 21 mobile phones of various models from 7 different brands. The aforementioned database, that is called MOBIPHONE, was collected by recording 10 utterances, uttered by 12 male speakers and another 12 female speakers, randomly chosen from the TIMIT database. Three commonly used classifiers were employed, such as Support Vector Machines with different kernels, a Radial Basis Functions neural network, and a Multi-Layer Perceptron. The best identification accuracy (97.6%) was obtained by the Radial Basis Functions neural network.

References (34)

  1. D. Garcia-Romero and C. Y. Espy-Wilson, "Automatic acquisition device identification from speech recordings," in Proc. 2010 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Dallas, TX, USA, 2010, pp. 1806-1809.
  2. C. Hanilci, F. Ertas, T. Ertas, and O. Eskidere, "Recognition of brand and models of cell-phones from recorded speech signals," IEEE Trans. Information Forensics and Security, vol. 7, no. 2, pp. 625-634, 2012.
  3. R. Maher, "Audio forensic examination," IEEE Signal Processing Mag- azine, vol. 26, no. 2, pp. 84-94, 2009.
  4. Proceedings of the 19th International Conference on Digital Signal Processing 20-23 August 2014 978-1-4799-4612-9/14/$31.00 © 2014 IEEE 590 DSP 2014
  5. R. Yang, Z. Qu, and J. Huang, "Detecting digital audio forgeries by checking frame offsets," in Proc. 10th ACM Multimedia and Security Workshop, New York, NY, USA, 2008, pp. 21-26.
  6. D. Luo, W. Luo, R. Yang, and J. Huang, "Compression history identifi- cation for digital audio signal," in Proc. 2012 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Kyoto, Japan, 2012, pp. 1733-1736.
  7. J. Zhou, D. Garcia-Romero, and C. Y. Espy-Wilson, "Automatic speech codec identification with applications to tampering detection of recordings," in Proc. 12th INTERSPEECH, Florence, Italy, 2011, pp. 2533-2536.
  8. F. Jenner and A. Kwasinski, "Highly accurate non-intrusive speech forensics for codec identifications from observed decoded signals," in Proc. 2012 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Kyoto, Japan, 2012, pp. 1737-1740.
  9. D. Sharma, P. A. Naylor, N. D. Gaubitch, and M. Brookes, "Non intrusive codec identification algorithm," in Proc. 2012 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Kyoto, Japan, 2012, pp. 4477- 4480.
  10. A. Oermann, A. Lang, and J. Dittmann, "Verifier-tuple for audio-forensic to determine speaker environment," in Proc. 7th ACM Multimedia and Security Workshop, New York, NY, USA, 2005, pp. 57-62.
  11. C. Kraetzer, A. Oermann, J. Dittmann, and A. Lang, "Digital audio forensics: a first practical evaluation on microphone and environment classification," in Proc. 9th ACM Multimedia and Security Workshop, Dallas, TX, USA, 2007, pp. 63-74.
  12. C. Kraetzer, M. Schott, and J. Dittmann, "Unweighted fusion in mi- crophone forensics using a decision tree and linear logistic regression models," in Proc. 11th ACM Multimedia and Security Workshop, Prince- ton, NJ, USA, 2009, pp. 49-56.
  13. H. Malik and H. Farid, "Audio forensics from acoustic reverberation," in Proc. 2010 IEEE Int. Conf. Acoustics Speech and Signal Processing, Dallas, TX, USA, 2010, pp. 1710-1713.
  14. C. C. Huang and J. Epps, "A study of automatic phonetic segmentation for forensic voice comparison," in Proc. 2012 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Kyoto, Japan, 2012, pp. 1853-1856.
  15. H. Zhao and H. Malik, "Acoustic recording location identification using acoustic environment signature," IEEE Trans. Information Forensics and Security, vol. 8, no. 11, pp. 1746-1759, 2013.
  16. H. Malik, "Acoustic environment identification and its applications to audio forensics," IEEE Trans. Information Forensics and Security, vol. 8, no. 11, pp. 1827-1837, 2013.
  17. A. J. Cooper, "Further considerations for the analysis of ENF data for forensic audio and video applications," The Int. J. of Speech, Language, and the Law, vol. 18, no. 1, pp. 99-120, 2011.
  18. O. Ojowu, Jr., J. Karlsson, J. Li, and Y. Liu, "ENF extraction from digital recordings using adaptive techniques and frequency tracking," IEEE Trans. Information Forensics and Security, vol. 7, no. 4, pp. 1330- 1338, 2012.
  19. V. A. Balasubramaniyan, P. Amit, M. Ahamad, M. Hunter, and P. Tranyo, "PinDr0p: Using single-ended audio features to determine call prove- nances," in Proc. 17th ACM Conf. Computer Communications, Chicago, IL, 2010, pp. 109-120.
  20. S. Kishore and Y. B., "Identification of handset type using autoasso- ciative neural networks," in Proc. 4th Int. Conf. Advances in Pattern Recognition and Digital Techniques, 1999, pp. 353-356.
  21. M.-W. Mak and S.-Y. Kung, "Combining stochastic feature transforma- tion and handset identification for telephone-based speaker verification," in Proc. 2002 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. I, Orlando, FL, 2002, pp. 701-704.
  22. S. B. Davies and P. Mermelstein, "Comparison of parametric repre- sentations for monosyllabic word recognition in continuously spoken sentences," IEEE Trans. ASSP, vol. 28, no. 4, pp. 357-366, 1980.
  23. D. Reynolds, "HTIMIT and LLHDB: speech corpora for the study of handset transducer effects," in Proc. 1997 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 2, Munich, Germany, 1997, pp. 1535-1538.
  24. Y. Panagakis and C. Kotropoulos, "Automatic telephone handset identi- fication by sparse representation of random spectral features," in Proc. 14th ACM Multimedia and Security Workshop, Coventry, U.K., 2012, pp. 91-95.
  25. --, "Telephone handset identification by feature selection and sparse representations," in Proc. 2012 IEEE Int. Workshop Information Foren- sics and Security, Tenerife, Spain, 2012, pp. 73-78.
  26. C. Kotropoulos, "Source phone identification using sketches of features," IET Biometrics, 2014.
  27. D. A. Reynolds, T. F. Quatieri, and R. Dunn, "Speaker verification using adapted gaussian mixture models," Digital Signal Processing, vol. 10, no. 1-3, pp. 19-41, 2000.
  28. J. Garofolo, "Getting started with the DARPA TIMIT cd-rom: An acoustic phonetic continuous speech database," National Inst. Standards and Technology (NIST), Tech. Rep., 1988.
  29. J. R. Deller, Jr., J. H. L. Hansen, and J. G. Proakis, Discrete-Time Processing of Speech Signals. New York, NY, USA: Wiley-Interscience- IEEE, 2000.
  30. A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the em algorithm (with discussion)," Journal Royal Statistical Society, Series B, vol. 39, pp. 1-38, 1977.
  31. V. Vapnik, Statistical Learning Theory. New York, NY, USA: J. Wiley & Sons, 1998.
  32. C.-C. Chang and C.-J. Lin, "LIBSVM: A library for support vector machines," ACM Transactions Intelligent System Technologies, vol. 2, no. 3, pp. 1-27, 2011.
  33. S. Haykin, Neural Networks and Learning Machines, 3/e. Upper Saddle River, N.J., USA: Prentice Hall, 2008.
  34. I. Guyon, J. Makhoul, R. Schwartz, and V. Vapnik, "What size test set gives good error rate estimates?" IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 52-64, 1998.