Characterising depressed speech for classification
Abstract
Depression is a serious psychiatric disorder that affects mood, thoughts, and the ability to function in everyday life. This paper investigates the characteristics of depressed speech for the purpose of automatic classification by analysing the effect of different speech features on the classification results. We analysed voiced, unvoiced and mixed speech in order to gain a better understanding of depressed speech and to bridge the gap between physiological and affective computing studies. This understanding may ultimately lead to an objective affective sensing system that supports clinicians in their diagnosis and monitoring of clinical depression. The characteristics of depressed speech were statistically analysed using ANOVA and linked to their classification results using GMM and SVM. Features were extracted and classified over speech utterances of 30 clinically depressed patients against 30 controls (both gender-matched) in a speaker-independent manner. Most feature classification results were consistent with their statistical characteristics, providing a link between physiological and affective computing studies. The classification results from low-level features were slightly better than the statistical functional features, which indicates a loss of information in the latter. We found that both mixed and unvoiced speech were as useful in detecting depression as voiced speech, if not better.
References (30)
- References
- U. D. of Health and H. Services, "Healthy people 2010: Under- standing and improving health," Health San Francisco, vol. 2nd, no. 46, p. 62 p., 2000.
- S. Baik, B. J. Bowers, L. D. Oakley, and J. L. Susman, "The recog- nition of depression: The primary care clinicians perspective," An- nals Of Family Medicine, vol. 3, no. 1, pp. 31-37, 2005.
- J. C. Mundt, P. J. Snyder, M. S. Cannizzaro, K. Chappie, and D. S. Geralts, "Voice acoustic measures of depression severity and treatment response collected via interactive voice response tech- nology," Journal of Neurolinguistics, vol. 20, no. 1, pp. 50-64, 2007.
- A. Nilsonne, "Speech characteristics as indicators of depressive illness," Acta Psychiatrica Scandinavica, vol. 77, no. 3, pp. 253- 263, 1988.
- S. Kuny and H. H. Stassen, "Speaking behavior and voice sound characteristics in depressive patients during recovery," Journal of Psychiatric Research, vol. 27, no. 3, pp. 289-307, 1993.
- H. Ellgring and K. R. Scherer, "Vocal indicators of mood change in depression," Journal of Nonverbal Behavior, vol. 20, no. 2, pp. 83-110, 1996.
- S. G. Koolagudi and K. S. Rao, "Emotion recognition from speech: a review," International Journal of Speech Technology, vol. 15, no. 3, pp. 37-40, 2011.
- A. J. Flint, S. E. Black, I. Campbell-Taylor, G. F. Gailey, and C. Levinton, "Abnormal speech articulation, psychomotor retar- dation, and subcortical dysfunction in major depression," Journal of Psychiatric Research, vol. 27, no. 3, pp. 309-319, Jul. 1993.
- E. Moore, M. Clements, J. Peifer, and L. Weisser, "Critical analy- sis of the impact of glottal features in the classification of clinical depression in speech." IEEE Trans. on Bio-medical Eng., vol. 55, no. 1, pp. 96-107, Jan. 2008.
- D. J. France, R. G. Shiavi, S. Silverman, M. Silverman, and D. M. Wilkes, "Acoustical properties of speech as indicators of depres- sion and suicidal risk." IEEE Trans. on bio-medical Eng., vol. 47, no. 7, pp. 829-37, Jul. 2000.
- K. R. Scherer, Vocal assessment of affective disorders. Lawrence Erlbaum Associates, 1987, pp. 57-82.
- A. Nunes, L. Coimbra, and A. Teixeira, "Voice quality of euro- pean portuguese emotional speech corresponding author," Com- putational Processing of the Portuguese Language Lecture Notes in Computer Science, vol. 6001/2010, pp. 142-151, 2010.
- L. A. Low, N. C. Maddage, M. Lech, L. B. Sheeber, and N. B. Allen, "Detection of clinical depression in adolescents speech dur- ing family interactions." IEEE Trans. on Biomedical Eng., vol. 58, no. 3, pp. 574-586, 2011.
- A. Ozdas, R. G. Shiavi, S. E. Silverman, M. K. Silverman, and D. M. Wilkes, "Investigation of vocal jitter and glottal flow spec- trum as possible cues for depression and near-term suicidal risk." IEEE Trans. on Biomedical Eng., vol. 51, no. 9, pp. 1530-1540, 2004.
- A. Ozdas, R. Shiavi, S. Silverman, M. Silverman, and D. Wilkes, "Analysis of fundamental frequency for near term suicidal risk assessment," IEEE Conf. Systems, Man, Cybernetics, pp. 1853- 1858, 2000.
- E. Moore, M. Clements, J. Peifer, and L. Weisser, "Comparing ob- jective feature statistics of speech for classifying clinical depres- sion," Proc. 26th Ann. Conf. Eng. Med. Biol., vol. 1, pp. 17-20, Jan. 2004.
- J. F. Cohn, T. S. Kruez, I. Matthews, Y. Yang, M. H. Nguyen, M. T. Padilla, F. Zhou, and F. De la Torre, "Detecting depression from facial actions and vocal prosody," 2009 3rd International Con- ference on Affective Computing and Intelligent Interaction and Workshops, pp. 1-7, Sep. 2009.
- N. Cummins, J. Epps, M. Breakspear, and R. Goecke, "An Inves- tigation of Depressed Speech Detection: Features and Normaliza- tion," in Proc. Interspeech, 2011, pp. 2997-3000.
- S. Alghowinem, R. Goecke, M. Wagner, J. Epps, M. Breakspear, and G. Parker, "From Joyous to Clinically Depressed: Mood De- tection Using Spontaneous Speech," in Proc. FLAIRS-25, 2012, accepted.
- K. E. B. Ooi, L.-S. A. Low, M. Lech, and N. Allen, "Early predic- tion of major depression in adolescents using glottal wave char- acteristics and teager energy parameters," in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Con- ference on. IEEE, 2012, pp. 4613-4616.
- G. Zhou, J. H. Hansen, and J. F. Kaiser, "Classification of speech under stress based on features derived from the nonlinear tea- ger energy operator," in Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, vol. 1. IEEE, 1998, pp. 549-552.
- S. Koolagudi and K. Rao, "Emotion recognition from speech: a review," International Journal of Speech Technology, pp. 1-19.
- Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, "A survey of affect recognition methods: audio, visual, and spontaneous ex- pressions." IEEE Trans. on PAMI, vol. 31, no. 1, pp. 39-58, 2007.
- S. Alghowinem, R. Goecke, M. Wagner, J. Epps, M. Breakspear, and G. Parker, "Detecting Depression: A Comparison between Spontaneous and Read Speech," in IEEE International Confer- ence on Acoustics, Speech and Signal Processing (ICASSP 2013), May 2013.
- F. Eyben, M. Wöllmer, and B. Schuller, "Opensmile: the munich versatile and fast open-source audio feature extractor," in Proc. ACM Multimedia (MM'10), Oct. 2010, pp. 1459-1462.
- M. Brookes et al., "Voicebox: Speech processing toolbox for matlab," Software, available [Mar. 2011] from www. ee. ic. ac. uk/hp/staff/dmb/voicebox/voicebox. html, 1997.
- S. Alghowinem, R. Goecke, M. Wagner, J. Epps, G. T, M. Breaks- pear, and G. Parker, "A Comparative Study of Different Classifiers for Detecting Depression from Spontaneous Speech," in IEEE In- ternational Conference on Acoustics, Speech and Signal Process- ing (ICASSP 2013), May 2013.
- B. Schuller, A. Batliner, S. Steidl, and D. Seppi, "Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge," Speech Communication, vol. 53, no. Feb, pp. 1062-1087, 2011.
- C. C. Chang and C. J. Lin, "Libsvm: a library for svm," 2006-03- 04]. http: www. csic. ntu. edu. tw/rcjlin/papers/lib. svm, pp. 1-30, 2001.