Academia.eduAcademia.edu

Outline

Toward Real-World Voice Disorder Classification

IEEE Transactions on Biomedical Engineering

https://doi.org/10.1109/TBME.2023.3270532

Abstract

Objective: Voice disorders significantly compromise individuals' ability to speak in their daily lives. Without early diagnosis and treatment, these disorders may deteriorate drastically. Thus, automatic classification systems at home are desirable for people who are inaccessible to clinical disease assessments. However, the performance of such systems may be weakened due to the constrained resources and domain mismatch between the clinical data and noisy real-world data. Methods: This study develops a compact and domain-robust voice disorder classification system to identify the utterances of health, neoplasm, and benign structural diseases. Our proposed system utilizes a feature extractor model composed of factorized convolutional neural networks and subsequently deploys domain adversarial training to reconcile the domain mismatch by extracting domain-invariant features. Results: The results show that the unweighted average recall in the noisy realworld domain improved by 13% and remained at 80% in the clinic domain with only slight degradation. The domain mismatch was effectively eliminated. Moreover, the proposed system reduced the usage of both memory and computation by over 73.9%. Conclusion: By deploying factorized convolutional neural networks and domain adversarial training, domain-invariant features can be derived for voice disorder classification with limited resources. The promising results confirm that the proposed system can significantly reduce resource consumption and improve classification accuracy by considering the domain mismatch. Significance: To the best of our knowledge, this is the first study that jointly considers real-world model compression and noise-robustness issues in voice disorder classification. The proposed system is intended for application to embedded systems with limited resources.

References (77)

  1. J. K. Laguaite, "Adult voice screening," Journal of Speech and Hearing Disorders, vol. 37, no. 2, pp. 147-151, 1972.
  2. D. E. Morley, "A ten-year survey of speech disorders among university students," Journal of Speech and Hearing disorders, vol. 17, no. 1, pp. 25-31, 1952.
  3. L. O. Ramig and K. Verdolini, "Treatment efficacy: voice disorders," Journal of Speech, Language, and Hearing Research, vol. 41, no. 1, pp. S101-S116, 1998.
  4. N. Roy, R. M. Merrill, S. D. Gray, and E. M. Smith, "Voice disorders in the general population: prevalence, risk factors, and occupational impact," The Laryngoscope, vol. 115, no. 11, pp. 1988-1995, 2005.
  5. S. M. Cohen, "Self-reported impact of dysphonia in a primary care population: An epidemiological study," The Laryngoscope, vol. 120, no. 10, pp. 2022-2032, 2010.
  6. S. R. Best and C. Fakhry, "The prevalence, diagnosis, and management of voice disorders in a national ambulatory medical care survey (namcs) cohort," The Laryngoscope, vol. 121, no. 1, pp. 150-157, 2011.
  7. S. M. Cohen, J. Kim, N. Roy, C. Asche, and M. Courey, "Prevalence and causes of dysphonia in a large treatment-seeking population," The Laryngoscope, vol. 122, no. 2, pp. 343-348, 2012.
  8. A. A. Dibazar, S. Narayanan, and T. W. Berger, "Feature analysis for automatic detection of pathological speech," in Proceedings of the Second Joint 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society][Engineering in Medicine and Biology, vol. 1. IEEE, 2002, pp. 182-183.
  9. P. Henríquez, J. B. Alonso, M. A. Ferrer, C. M. Travieso, J. I. Godino-Llorente, and F. Díaz-de María, "Characterization of healthy and pathological voice through measures based on nonlinear dynamics," IEEE transactions on audio, speech, and language processing, vol. 17, no. 6, pp. 1186-1195, 2009.
  10. G. Vaziri, F. Almasganj, and R. Behroozmand, "Pathological assessment of patients' speech signals using nonlinear dynamical analysis," Com- puters in biology and medicine, vol. 40, no. 1, pp. 54-63, 2010.
  11. J. M. Miramont, M. A. Colominas, and G. Schlotthauer, "Emulating perceptual evaluation of voice using scattering transform based features," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1892-1901, 2022.
  12. M. Cooke, O. Scharenborg, and B. T. Meyer, "The time course of adaptation to distorted speech," The Journal of the Acoustical Society of America, vol. 151, no. 4, pp. 2636-2646, 2022. [Online]. Available: https://doi.org/10.1121/10.0010235
  13. M. Illa, B. M. Halpern, R. van Son, L. Moro-Velázquez, and O. Scharen- borg, "Pathological voice adaptation with autoencoder-based voice con- version," arXiv preprint arXiv:2106.08427, 2021.
  14. B. M. Halpern, J. Fritsch, E. Hermann, R. van Son, O. Scharenborg, and M. Magimai-Doss, "An objective evaluation framework for pathological speech synthesis," in Speech Communication; 14th ITG Conference. VDE, 2021, pp. 1-5.
  15. J. Unger, J. Lohscheller, M. Reiter, K. Eder, C. S. Betz, and M. Schuster, "A noninvasive procedure for early-stage discrimination of malignant and precancerous vocal fold lesions based on laryngeal dynamics analysisearly-stage detection of malignant vocal fold lesions," Cancer research, vol. 75, no. 1, pp. 31-39, 2015.
  16. R. Fraile, N. Sáenz-Lechón, J. I. Godino-Llorente, V. Osma-Ruiz, and C. Fredouille, "Automatic detection of laryngeal pathologies in records of sustained vowels by means of Mel-frequency cepstral coefficient parameters and differentiation of patients by sex," Folia phoniatrica et logopaedica, vol. 61, no. 3, pp. 146-152, 2009.
  17. S. C. Costa, B. G. A. Neto, and J. M. Fechine, "Pathological voice discrimination using cepstral analysis, vector quantization and hidden markov models," in 2008 8th IEEE International Conference on BioIn- formatics and BioEngineering. IEEE, 2008, pp. 1-5.
  18. P. Harar, Z. Galaz, J. B. Alonso-Hernandez, J. Mekyska, R. Burget, and Z. Smekal, "Towards robust voice pathology detection," Neural Computing and Applications, vol. 32, no. 20, pp. 15 747-15 757, 2020.
  19. K. Umapathy, S. Krishnan, V. Parsa, and D. G. Jamieson, "Discrimi- nation of pathological voices using a time-frequency approach," IEEE Transactions on Biomedical Engineering, vol. 52, no. 3, pp. 421-430, 2005.
  20. M. Pützer and W. Wokurek, "Electroglottographic and acoustic parametrization of phonatory quality provide voice profiles of patho- logical speakers," Journal of Voice, 2021.
  21. C. Zhou, Y. Wu, Z. Fan, X. Zhang, D. Wu, and Z. Tao, "Gammatone spectral latitude features extraction for pathological voice detection and classification," Applied Acoustics, vol. 185, p. 108417, 2022.
  22. J. I. Godino-Llorente, P. Gomez-Vilda, and M. Blanco-Velasco, "Dimen- sionality reduction of a pathological voice quality assessment system based on gaussian mixture models and short-term cepstral parameters," IEEE transactions on biomedical engineering, vol. 53, no. 10, pp. 1943- 1953, 2006.
  23. M. K. Arjmandi and M. Pooyan, "An optimum algorithm in pathological voice quality assessment using wavelet-packet-based features, linear discriminant analysis and support vector machine," Biomedical Signal Processing and Control, vol. 7, no. 1, pp. 3-19, 2012.
  24. M. Markaki and Y. Stylianou, "Voice pathology detection and discrim- ination based on modulation spectral features," IEEE Transactions on audio, speech, and language processing, vol. 19, no. 7, pp. 1938-1948, 2011.
  25. I. Hammami, L. Salhi, and S. Labidi, "Pathological voices detection using support vector machine," in 2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP). IEEE, 2016, pp. 662-666.
  26. L. Verde, G. De Pietro, and G. Sannino, "Voice disorder identification by using machine learning techniques," IEEE access, vol. 6, pp. 16 246- 16 255, 2018.
  27. M. Pishgar, F. Karim, S. Majumdar, and H. Darabi, "Pathological voice classification using Mel-cepstrum vectors and support vector machine," arXiv preprint arXiv:1812.07729, 2018.
  28. J. D. Arias-Londoño, J. I. Godino-Llorente, M. Markaki, and Y. Stylianou, "On combining information from modulation spectra and Mel-frequency cepstral coefficients for automatic detection of patholog- ical voices," Logopedics Phoniatrics Vocology, vol. 36, no. 2, pp. 60-69, 2011.
  29. J. D. Arias-Londoño, J. I. Godino-Llorente, N. Sáenz-Lechón, V. Osma- Ruiz, and G. Castellanos-Domínguez, "Automatic detection of patho- logical voices using complexity measures, noise parameters, and Mel- cepstral coefficients," IEEE Transactions on biomedical engineering, vol. 58, no. 2, pp. 370-379, 2010.
  30. M. Dahmani and M. Guerti, "Recurrence quantification analysis of glottal signal as non linear tool for pathological voice assessment and classification." Int. Arab J. Inf. Technol., vol. 17, no. 6, pp. 857-866, 2020.
  31. A. Basalamah, M. Hasan, S. Bhowmik, and S. A. Shahriyar, "A highly accurate dysphonia detection system using linear discriminant analysis," COMPUTER SYSTEMS SCIENCE AND ENGINEERING, vol. 44, no. 3, pp. 1921-1938, 2023.
  32. H. Wu, J. Soraghan, A. Lowit, and G. Di Caterina, "Convolutional neural networks for pathological voice detection," in 2018 40th annual international conference of the ieee engineering in medicine and biology society (EMBC). IEEE, 2018, pp. 1-4.
  33. V. Gupta, "Voice disorder detection using long short term memory (LSTM) model," arXiv preprint arXiv:1812.01779, 2018.
  34. S.-H. Fang, Y. Tsao, M.-J. Hsiao, J.-Y. Chen, Y.-H. Lai, F.-C. Lin, and C.-T. Wang, "Detection of pathological voice using cepstrum vectors: A deep learning approach," Journal of Voice, vol. 33, no. 5, pp. 634-641, 2019.
  35. C.-H. Hung, S.-S. Wang, C.-T. Wang, and S.-H. Fang, "Using sincnet for learning pathological voice disorders," Sensors, vol. 22, no. 17, p. 6634, 2022.
  36. J.-Y. Lee, "Experimental evaluation of deep learning methods for an intelligent pathological voice detection system using the saarbruecken voice database," Applied Sciences, vol. 11, no. 15, p. 7149, 2021.
  37. W. Ariyanti, T. Hussain, J.-C. Wang, C.-T. Wang, S.-H. Fang, and Y. Tsao, "Ensemble and multimodal learning for pathological voice classification," IEEE Sensors Letters, vol. 5, no. 7, pp. 1-4, 2021.
  38. K. G. Dávid Sztahó and T. M. Gábriel, "Deep learning solution for pathological voice detection using lstm-based autoencoder hybrid with multi-task learning," 2021.
  39. T. Lee, Y. Liu, P.-W. Huang, J.-T. Chien, W. K. Lam, Y. T. Yeung, T. K. Law, K. Y. Lee, A. P.-H. Kong, and S.-P. Law, "Automatic speech recognition for acoustical analysis and assessment of cantonese pathological voice and speech," in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2016, pp. 6475-6479.
  40. Y. Liu, T. Lee, T. Law, and K. Y.-S. Lee, "Acoustical assessment of voice disorder with continuous speech using asr posterior features," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 6, pp. 1047-1059, 2019.
  41. S.-H. Fang, C.-T. Wang, J.-Y. Chen, Y. Tsao, and F.-C. Lin, "Combining acoustic signals and medical records to improve pathological voice clas- sification," APSIPA Transactions on Signal and Information Processing, vol. 8, 2019.
  42. T. Kojima, S. Fujimura, K. Hasebe, Y. Okanoue, O. Shuya, R. Yuki, K. Shoji, R. Hori, Y. Kishimoto, and K. Omori, "Objective assessment of pathological voice using artificial intelligence based on the grbas scale," Journal of Voice, 2021.
  43. Y.-T. Hsu, Z. Zhu, C.-T. Wang, S.-H. Fang, F. Rudzicz, and Y. Tsao, "Robustness against the channel effect in pathological voice detection," arXiv preprint arXiv:1811.10376, 2018.
  44. Z. Fan, Y. Wu, C. Zhou, X. Zhang, and Z. Tao, "Class-imbalanced voice pathology detection and classification using fuzzy cluster oversampling method," Applied Sciences, vol. 11, no. 8, p. 3450, 2021.
  45. Q. Jinyang, Z. Denghuang, F. Ziqi, W. Di, X. Yishen, and T. Zhi, "Patho- logical voice feature generation using generative adversarial network," in 2021 International Conference on Sensing, Measurement & Data Analytics in the era of Artificial Intelligence (ICSMD). IEEE, 2021, pp. 1-6.
  46. G. Muhammad, S. M. M. Rahman, A. Alelaiwi, and A. Alamri, "Smart health solution integrating iot and cloud: A case study of voice pathology monitoring," IEEE Communications Magazine, vol. 55, no. 1, pp. 69-73, 2017.
  47. G. Muhammad, M. F. Alhamid, M. Alsulaiman, and B. Gupta, "Edge computing with cloud for voice disorder assessment and treatment," IEEE Communications Magazine, vol. 56, no. 4, pp. 60-65, 2018.
  48. M. S. Hossain, G. Muhammad, and A. Alamri, "Smart healthcare monitoring: a voice pathology detection paradigm for smart cities," Multimedia Systems, vol. 25, pp. 565-575, 2019.
  49. S. U. Amin, M. S. Hossain, G. Muhammad, M. Alhussein, and M. A. Rahman, "Cognitive smart healthcare for pathology detection and mon- itoring," IEEE Access, vol. 7, pp. 10 745-10 753, 2019.
  50. A. Ramalingam, S. Kedari, and C. Vuppalapati, "Ieee femh voice data challenge 2018," in 2018 IEEE International Conference on Big Data (Big Data). IEEE, 2018, pp. 5272-5276.
  51. M. Pham, J. Lin, and Y. Zhang, "Diagnosing voice disorder with machine learning," in 2018 IEEE International Conference on Big Data (Big Data). IEEE, 2018, pp. 5263-5266.
  52. T. Grzywalski, A. Maciaszek, A. Biniakowski, J. Orwat, S. Drgas, M. Piecuch, R. Belluzzo, K. Joachimiak, D. Niemiec, J. Ptaszynski et al., "Parameterization of sequence of mfccs for dnn-based voice disorder detection," in 2018 IEEE International conference on big data (big data). IEEE, 2018, pp. 5247-5251.
  53. C. Bhat and S. K. Kopparapu, "FEMH voice data challenge: Voice disorder detection and classification using acoustic descriptors," in 2018 IEEE International Conference on Big Data (Big Data). IEEE, 2018, pp. 5233-5237.
  54. K. Degila, R. Errattahi, and A. El Hannani, "The UCD system for the 2018 FEMH voice data challenge," in 2018 IEEE International Conference on Big Data (Big Data). IEEE, 2018, pp. 5242-5246.
  55. J. D. Arias-Londoño, J. A. Gómez-García, L. Moro-Velázquez, and J. I. Godino-Llorente, "Byovoz automatic voice condition analysis system for the 2018 FEMH challenge," in 2018 IEEE International Conference on Big Data (Big Data). IEEE, 2018, pp. 5228-5232.
  56. K. A. Islam, D. Perez, and J. Li, "A transfer learning approach for the 2018 FEMH voice data challenge," in 2018 IEEE International Conference on Big Data (Big Data). IEEE, 2018, pp. 5252-5257.
  57. B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, "Quantization and training of neural networks for efficient integer-arithmetic-only inference," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2704- 2713.
  58. Y.-C. Lin, C. Yu, Y.-T. Hsu, S.-W. Fu, Y. Tsao, and T.-W. Kuo, "Seofp-net: Compression and acceleration of deep neural networks for speech enhancement using sign-exponent-only floating-points," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1016-1031, 2021.
  59. G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural network," arXiv preprint arXiv:1503.02531, 2015.
  60. M. Wang, B. Liu, and H. Foroosh, "Factorized convolutional neural networks," in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017, pp. 545-553.
  61. E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo, "Erfnet: Effi- cient residual factorized convnet for real-time semantic segmentation," IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 1, pp. 263-272, 2017.
  62. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "Mobilenets: Efficient convo- lutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.
  63. K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan, "Unsupervised pixel-level domain adaptation with generative adversarial networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3722-3731.
  64. E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell, "Deep domain confusion: Maximizing for domain invariance," arXiv preprint arXiv:1412.3474, 2014.
  65. H.-Y. Lin, H.-H. Tseng, X. Lu, and Y. Tsao, "Unsupervised noise adap- tive speech enhancement by discriminator-constrained optimal trans- port," Advances in Neural Information Processing Systems, vol. 34, 2021.
  66. H. Hu, S. M. Siniscalchi, C.-H. H. Yang, and C.-H. Lee, "A variational bayesian approach to learning latent variables for acoustic knowledge transfer," arXiv preprint arXiv:2110.08598, 2021.
  67. Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Lavi- olette, M. Marchand, and V. Lempitsky, "Domain-adversarial training of neural networks," The journal of machine learning research, vol. 17, no. 1, pp. 2096-2030, 2016.
  68. P. O. Pinheiro, "Unsupervised domain adaptation with similarity learn- ing," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8004-8013.
  69. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, "Mobilenetv2: Inverted residuals and linear bottlenecks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510-4520.
  70. A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan et al., "Searching for mobilenetv3," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1314-1324.
  71. I. Redko, E. Morvant, A. Habrard, M. Sebban, and Y. Bennani, "A survey on domain adaptation theory: learning bounds and theoretical guarantees," arXiv preprint arXiv:2004.11829, 2020.
  72. Y. Zhang, J. Qian, X. Zhang, Y. Xu, and Z. Tao, "Pathological voice detection using joint subsapce transfer learning," Applied Sciences, vol. 12, no. 16, p. 8129, 2022.
  73. S. Ruder, "An overview of gradient descent optimization algorithms," arXiv preprint arXiv:1609.04747, 2016.
  74. S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," in International conference on machine learning. PMLR, 2015, pp. 448-456.
  75. B. Xu, N. Wang, T. Chen, and M. Li, "Empirical evaluation of rectified activations in convolutional network," arXiv preprint arXiv:1505.00853, 2015.
  76. D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
  77. T.-T. Wong, "Performance evaluation of classification algorithms by k- fold and leave-one-out cross validation," Pattern Recognition, vol. 48, no. 9, pp. 2839-2846, 2015.