Co-Speech Movement in Conversational Turn-Taking
https://doi.org/10.3389/FCOMM.2021.779814Abstract
This study investigates co-speech movements as a function of the conversational turn exchange type, the type of speech material at a turn exchange, and the interlocutor's role as speaker or listener. A novel interactive protocol that mixes conversation and (non-read) nursery rhymes works to elicit many speech turns and co-speech movements within dyadic speech interaction. To evaluate a large amount of data, we use the density of cospeech movement as a quantitative measure. Results indicate that both turn exchange type and participant role are associated with variation in movement density for head and brow co-speech movement. Brow and head movement becomes denser as speakers approach overlapping speech exchanges, indicating that speakers increase their movement density as an interruptive exchange is approached. Similarly, head movement generally increases after such overlapping exchanges. Lastly, listeners display a higher rate of co-speech movement than speakers, both at speech turns and remote from them. Brow and head movements generally behave similarly across speech material types, conversational roles, and turn exchange types. On the whole, the study demonstrates that the quantitative co-speech movement density measure advanced here is useful in the study of co-speech movement and turn-taking.
References (84)
- Alibali, M. W., Kita, S., and Young, A. J. (2000). Gesture and the Process of Speech Production: We Think, Therefore We Gesture. Lang. Cogn. Process. 15, 593-613. doi:10.1080/016909600750040571
- Barbosa, A. v., Yehia, H. C., and Vatikiotis-Bateson, E. (2008). "Linguistically Valid Movement Behavior Measured Non-invasively," in Auditory Visual Speech Processing. Editors R. Gucke, P. Lucey, and S. Lucey (Queensland, Australia), 173-177.
- Barkhuysen, P., Krahmer, E., and Swerts, M. (2008). The Interplay between the Auditory and Visual Modality for End-Of-Utterance Detection. The J. Acoust. Soc. America 123, 354-365. doi:10.1121/1.2816561
- Barthel, M., Meyer, A. S., and Levinson, S. C. (2017). Next Speakers Plan Their Turn Early and Speak after Turn-Final "Go-Signals". Front. Psychol. 8, 1-10. doi:10.3389/fpsyg.2017.00393
- Bates, D., Mächler, M., Bolker, B., and Walker, S. (2015). Fitting Linear Mixed- Effects Models Using lme4. J. Stat. Soft. 67 (1), 1-48. doi:10.18637/jss.v067.i01
- Bavelas, J. B., Chovil, N., Coates, L., and Roe, L. (1995). Gestures Specialized for Dialogue. Pers Soc. Psychol. Bull. 21, 394-405. doi:10.1177/0146167295214010
- Bavelas, J. B., Chovil, N., Lawrie, D. A., and Wade, A. (1992). Interactive Gestures. Discourse Process. 15, 469-489. doi:10.1080/01638539209544823
- Bavelas, J. B., Coates, L., and Johnson, T. (2002). Listener Responses as a Collaborative Process: The Role of Gaze. J. Commun. 52, 566-580. doi:10.1111/j.1460-2466.2002.tb02562.x
- Bavelas, J., Gerwing, J., Sutton, C., and Prevost, D. (2008). Gesturing on the Telephone: Independent Effects of Dialogue and Visibility. J. Mem. Lang. 58, 495-520. doi:10.1016/j.jml.2007.02.004
- Boersma, P., and Weenink, D. (2016). Praat: Doing Phonetics by Computer. Available at: http://www.praat.org/.
- Bögels, S., and Torreira, F. (2015). Listeners Use Intonational Phrase Boundaries to Project Turn Ends in Spoken Interaction. J. Phonetics 52, 46-57. doi:10.1016/ j.wocn.2015.04.004
- Bolinger, D. (1983). Intonation and Gesture. Am. Speech 58, 156-174. doi:10.2307/ 455326
- Borràs-Comes, J., Kaland, C., Prieto, P., and Swerts, M. (2014). Audiovisual Correlates of Interrogativity: A Comparative Analysis of Catalan and Dutch. J. Nonverbal Behav. 38, 53-66. doi:10.1007/s10919-013-0162-0
- Clark, H. H., and Krych, M. A. (2004). Speaking while Monitoring Addressees for Understanding. J. Mem. Lang. 50, 62-81. doi:10.1016/j.jml.2003.08.004
- Cummins, F. (2012). Gaze and Blinking in Dyadic Conversation: A Study in Coordinated Behaviour Among Individuals. Lang. Cogn. Process. 27, 1525-1549. doi:10.1080/01690965.2011.615220
- Cvejic, E., Kim, J., Davis, C., and Gibert, G. (2010). "Prosody for the Eyes: Quantifying Visual Prosody Using Guided Principal Component Analysis," in Proceedings of the 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010, 1433-1436.
- Danner, S. G. (2017). Effects of Speech Context on Characteristics of Manual Gesture. Doctoral dissertation Los Angeles, (CA): University of Southern California in Los Angeles.
- Danner, S. G., Barbosa, A. V., and Goldstein, L. (2018). Quantitative Analysis of Multimodal Speech Data. J. Phonetics 71, 268-283. doi:10.1016/j.wocn.2018.09.007
- Davies, M. (2008). The Corpus of Contemporary American English: 450 Million Words, 1990-present. Available at: http://corpus.byu.edu/coca/.
- Dideriksen, C., Fusaroli, R., Tylén, K., Dingemanse, M., and Christiansen, M. H. (2019). "Contextualizing Conversational Strategies: Backchannel, Repair and Linguistic Alignment in Spontaneous and Task-Oriented Conversations," in Proceedings of the 41st Annual Conference of the Cognitive Science Society : Creativity + Cognition + Computation. Editors A. K. Goel, C. M. Seifert, and C. Freksa (Canada: Montreal), 261-267. doi:10.31234/osf.io/fd8y9
- Duncan, S., Jr. (1972). Some Signals and Rules for Taking Speaking Turns in Conversations. J. Personal. Soc. Psychol. 23, 283-292. doi:10.1037/h0033031
- Flecha-García, M. L. (2010). Eyebrow Raises in Dialogue and Their Relation to Discourse Structure, Utterance Function and Pitch Accents in English. Speech Commun. 52, 542-554. doi:10.1016/j.specom.2009.12.003
- Freelon, D. (2013). ReCal OIR: Ordinal, Interval, and Ratio Intercoder Reliability as a Web Service. Int. J. Internet Sci. 8, 10-16. Available at: http://www.ijis.net/ ijis8_1/ijis8_1_freelon_pre.html.
- Fuchs, S., and Reichel, U. D. (2016). "On the Relationship between Pointing Gestures and Speech Production in German Counting Out Rhymes : Evidence from Motion Capture Data and Speech Acoustics," in Proceedings of P&P 12. Editors C. Draxler and F. Kleber Munich: Ludwig Maximilian University, 1-4.
- Gamer,M.,Lemon,J.,Fellows,I.,andSingh,P.(2019).VariousCoefficients of Interrater Reliability and Agreement. Available at: https://cran.r-project.org/package irr.
- Garrod, S., and Pickering, M. J. (2015). The Use of Content and Timing to Predict Turn Transitions. Front. Psychol. 6, 1-12. doi:10.3389/fpsyg.2015.00751
- Geluykens, R., and Swerts, M. (1992). Prosodic Topic-and Turn-Finality Cues. Proceedings of the IRCS Workshop on Prosody in Natural Speech, Netherlands, 63-70.
- Gillespie, M., James, A. N., Federmeier, K. D., and Watson, D. G. (2014). Verbal Working Memory Predicts Co-speech Gesture: Evidence from Individual Differences. Cognition 132, 174-180. doi:10.1016/j.cognition.2014.03.012
- Goldin-Meadow, S., Nusbaum, H., Kelly, S. D., and Wagner, S. (2001). Explaining Math: Gesturing Lightens the Load. Psychol. Sci. 12, 516-522. doi:10.1111/ 1467-9280.00395
- Gordon Danner, S., Krivokapić, J., and Byrd, D. (2021). Dataset for Co-speech Movement in Conversational Turn-Taking. V. doi:10.17632/jy5t72fd32.1
- Goujon, A., Bertrand, R., and Tellier, M. (2015). Eyebrows in French Talk-In- Interaction Nantes, France.
- Granum, M. (2017). NurseryRhymes.org -Nursery Rhymes with Lyrics and Music. Available at: https://www.nurseryrhymes.org/(Accessed November 6, 2018).
- Guaïtella, I., Santi, S., Lagrue, B., and Cavé, C. (2009). Are Eyebrow Movements Linked to Voice Variations and Turn-Taking in Dialogue? an Experimental Investigation. Lang. Speech 52, 207-222. doi:10.1177/0023830909103167
- Gullberg, M. (2010). Language-specific Encoding of Placement Events in Gestures. Event Representation Lang. Cogn. 11, 166-188. doi:10.1017/ CBO9780511782039.008
- Hadar, U., Steiner, T. J., and Clifford Rose, F. (1985). Head Movement during Listening Turns in Conversation. J. Nonverbal Behav. 9, 214-228. doi:10.1007/ bf00986881
- Hadar, U., Steiner, T. J., Grant, E. C., and Rose, F. C. (1983). Kinematics of Head Movements Accompanying Speech during Conversation. Hum. Movement Sci. 2, 35-46. doi:10.1016/0167-9457(83)90004-0
- Hilton, K. (2018). What Does an Interruption Sound like? Stanford University Hoetjes, M., Krahmer, E., and Swerts, M. (2014). Does Our Speech Change when We Cannot Gesture? Speech Commun. 57, 257-267. doi:10.1016/ j.specom.2013.06.007
- Holler, J., Kendrick, K. H., and Levinson, S. C. (2017). Processing Language in Face-To-Face Conversation: Questions with Gestures Get Faster Responses. Psychon. Bull. Rev. 25, 1900-1908. doi:10.3758/s13423-017-1363-z
- Holler, J., Schubotz, L., Kelly, S., Hagoort, P., Schuetze, M., and Özyürek, A. (2014). Social Eye Gaze Modulates Processing of Speech and Co-speech Gesture. Cognition 133, 692-697. doi:10.1016/j.cognition.2014.08.008
- Hömke, P., Holler, J., and Levinson, S. C. (2018). Eye Blinks Are Perceived as Communicative Signals in Human Face-To-Face Interaction. PLoS ONE 13, e0208030-13. doi:10.1371/journal.pone.0208030
- Ishi, C. T., Ishiguro, H., and Hagita, N. (2014). Analysis of Relationship between Head Motion Events and Speech in Dialogue Conversations. Speech Commun. 57, 233-243. doi:10.1016/j.specom.2013.06.008
- Kelly, S. D., Özyürek, A., and Maris, E. (2010). Two Sides of the Same Coin. Psychol. Sci. 21, 260-267. doi:10.1177/0956797609357327
- Kendon, A. (1972). "Some Relationships between Body Motion and Speech," in Studies In Dyadic Communication. Editors A. W. Siegman and B. Pope (New York: Pergamon Press), 177-210. doi:10.1016/b978-0-08-015867-9.50013-7
- Kim, J., Cvejic, E., and Davis, C. (2014). Tracking Eyebrows and Head Gestures Associated with Spoken Prosody. Speech Commun. 57, 317-330. doi:10.1016/ j.specom.2013.06.003
- Kita, S. (2009). Cross-cultural Variation of Speech-Accompanying Gesture: A Review. Lang. Cogn. Process. 24 (2), 145-167. doi:10.1080/01690960802586188
- Kita, S., and Özyürek, A. (2003). What Does Cross-Linguistic Variation in Semantic Coordination of Speech and Gesture Reveal?: Evidence for an Interface Representation of Spatial Thinking and Speaking. J. Mem. Lang. 48, 16-32. doi:10.1016/S0749-596X(02)00505-3
- Krahmer, E., and Swerts, M. (2007). The Effects of Visual Beats on Prosodic Prominence: Acoustic Analyses, Auditory Perception and Visual Perception. J. Mem. Lang. 57, 396-414. doi:10.1016/j.jml.2007.06.005
- Krauss, R. M. (1998). Why Do We Gesture when We Speak? Curr. Dir. Psychol. Sci. 7, 54. doi:10.1111/1467-8721.ep13175642
- Kuznetsova, A., Bruun Brockhoff, P., and Haubo Bojesen Christensen, R. (2014). lmerTest: Tests in Linear Mixed Effects Models. Available at: http://cran.r- project.org/package lmerTest.
- Latif, N., Barbosa, A. v., Vatiokiotis-Bateson, E., Castelhano, M. S., and Munhall, K. G. (2014). Movement Coordination during Conversation. PLoS ONE 9, e105036. doi:10.1371/journal.pone.0105036
- Lee, C.-C., and Narayanan, S. (2010). "Predicting Interruptions in Dyadic Spoken Interactions," in Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 14-19 March 2010 (Dallas, Texas: IEEE), 5250-5253. doi:10.1109/ ICASSP.2010.5494991
- Lee, Y., Gordon Danner, S., Parrell, B., Lee, S., Goldstein, L., and Byrd, D. (2018). Articulatory, Acoustic, and Prosodic Accommodation in a Cooperative Maze Navigation Task. PLOS ONE 13, e0201444. doi:10.1371/journal.pone.0201444
- Leonard, T., and Cummins, F. (2011). The Temporal Relation between Beat Gestures and Speech. Lang. Cogn. Process. 26, 1457-1471. doi:10.1080/ 01690965.2010.500218
- Levelt, W. J. M., Richardson, G., and la Heij, W. (1985). Pointing and Voicing in Deictic Expressions. J. Mem. Lang. 24, 133-164. doi:10.1016/0749-596X(85) 90021-X Levinson, S. C., and Holler, J. (2014). The Origin of Human Multi-Modal Communication. Phil. Trans. R. Soc. B 369, 20130302. doi:10.1098/ rstb.2013.0302
- Levinson, S. C., and Torreira, F. (2015). Timing in Turn-Taking and its Implications for Processing Models of Language. Front. Psychol. 6, 731. doi:10.3389/fpsyg.2015.00731
- Levitan, R., Beňuš, Š., Gravano, A., and Hirschberg, J. (2015). Entrainment and Turn-Taking in Human-Human Dialogue. Turn-taking Coord. Human- Machine Interaction: Pap. 2015 AAAI Spring Symp.,4 4 -51.
- Loehr, D. P. (2004). Gesture and Intonation. PhD thesis Washington, (DC): Georgetown University.
- Magyari, L., Bastiaansen, M. C. M., de Ruiter, J. P., and Levinson, S. C. (2014). Early Anticipation Lies behind the Speed of Response in Conversation. J. Cogn. Neurosci. 26, 2530-2539. doi:10.1162/jocn_a_00673
- MATLAB (2018). Version R2018a Natick, Massachusetts: The MathWorks Inc..
- McClave, E. Z. (2000). Linguistic Functions of Head Movements in the Context of Speech. J. Pragmatics 32, 855-878. doi:10.1016/S0378-2166(99)00079-X Melinger, A., and Kita, S. (2007). Conceptualisation Load Triggers Gesture Production. Lang. Cogn. Process. 22, 473-500. doi:10.1080/01690960600696916
- Mondada, L. (2007). Multimodal Resources for Turn-Taking. Discourse Stud. 9, 194-225. doi:10.1177/1461445607075346
- Munhall, K. G., Jones, J. A., Callan, D. E., Kuratate, T., and Vatikiotis-Bateson, E. (2004). Visual Prosody and Speech Intelligibility. Psychol. Sci. 15, 133-137. doi:10.1111/j.0963-7214.2004.01502010.x
- Munhall, K. G., Ostry, D. J., and Parush, A. (1985). Characteristics of Velocity Profiles of Speech Movements. J. Exp. Psychol. Hum. Perception Perform. 11, 457-474. doi:10.1037/0096-1523.11.4.457
- Nota, N., Trujillo, J. P., and Holler, J. (2021). Facial Signals and Social Actions in Multimodal Face-To-Face Interaction. Brain Sci. 11, 1017. doi:10.3390/ brainsci11081017
- Ostry, D. J., Cooke, J. D., and Munhall, K. G. (1987). Velocity Curves of Human Arm and Speech Movements. Exp. Brain Res. 68, 37-46. doi:10.1007/ BF00255232
- Özyürek, A., Kita, S., Allen, S. E. M., Furman, R., and Brown, A. (2005). How Does Linguistic Framing of Events Influence Co-speech Gestures? Gest 5, 219-240. doi:10.1075/gest.5.1.15ozy
- Prieto, P., Puglesi, C., Borràs-Comes, J., Arroyo, E., and Blat, J. (2015). Exploring the Contribution of Prosody and Gesture to the Perception of Focus Using an Animated Agent. J. Phonetics 49, 41-54. doi:10.1016/j.wocn.2014.10.005
- R Core Team (2021). R: A Language and Environment for Statistical Computing. Available at: https://www.gbif.org/tool/81287/r-a-language-and-environment- for-statistical-computing. doi:10.1007/978-3-540-74686-7
- Roberts, S. n. G., Torreira, F., and Levinson, S. C. (2015). The Effects of Processing and Sequence Organization on the Timing of Turn Taking: a Corpus Study. Front. Psychol. 6, 509. doi:10.3389/fpsyg.2015.00509
- Rochet-Capellan, A., and Fuchs, S. (2014). Take a Breath and Take the Turn: How Breathing Meets Turns in Spontaneous Dialogue. Phil. Trans. R. Soc. B 369, 20130399. doi:10.1098/rstb.2013.0399
- Ruiter, J.-P. d., Mitterer, H., and Enfield, N. J. (2006). Projecting the End of a Speaker's Turn: A Cognitive Cornerstone of Conversation. Language 82, 515-535. doi:10.1353/lan.2006.0130
- Scobbie, J. M., Turk, A., Geng, C., King, S., Lickley, R., and Richmond, K. (2013). "The edinburgh Speech Production Facility Doubletalk Corpus," in Proceedings of the Annual Conference of the International Speech Communication Association, August 2013, 764-766.
- Sikveland, R. O., and Ogden, R. (2012). Holding Gestures across Turns. Gest 12, 166-199. doi:10.1075/gest.12.2.03sik
- Singmann, H., Bolker, B., Westfall, J., Aust, F., and Ben-Shachar, M. S. (2021). Analysis of Factorial Experiments. R package version 1.0-1. Available at: https:// CRAN.R-project.org/package afex.
- Stivers, T., Enfield, N. J., Brown, P., Englert, C., Hayashi, M., Heinemann, T., et al. (2009). Universals and Cultural Variation in Turn-Taking in Conversation. Proc. Natl. Acad. Sci. 106, 10587-10592. doi:10.1073/pnas.0903616106
- Tiede, M., Bundgaard-Nielsen, R., Kroos, C., Gibert, G., Attina, V., Kasisopa, B., et al. (2010). Speech Articulator Movements Recorded from Facing Talkers Using Two Electromagnetic Articulometer Systems Simultaneously. J. Acoust. Soc. America 128, 1-9. doi:10.1121/1.3508805
- Trujillo,J.P.,Levinson,S.C.,andHoller,J.(2021)."Visual Information in Computer- Mediated Interaction Matters: Investigating the Association between the Availability of Gesture and Turn Transition Timing in Conversation," in Human-Computer Interaction. Design and User Experience Case Studies. HCII 2021. Lecture Notes in Computer Science.E d i t o rM .K u r o s u( B e r l i n / H e i delberg, Germany: Springer), 12764, 643-657. doi:10.1007/978-3-030-78468-3_44
- Vilela Barbosa, A., Déchaine, R.-M., Vatikiotis-Bateson, E., and Camille Yehia, H. (2012). Quantifying Time-Varying Coordination of Multimodal Speech Signals Using Correlation Map Analysis. J. Acoust. Soc. America 131, 2162-2172. doi:10.1121/1.3682040
- Voigt, R., Eckert, P., Jurafsky, D., and Podesva, R. J. (2016). Cans and Cants: Computational Potentials for Multimodality with a Case Study in Head Position. J. Sociolinguistics 20, 677-711. doi:10.1111/josl.12216
- Wagner, P., Malisz, Z., and Kopp, S. (2014). Gesture and Speech in Interaction: An Overview. Speech Commun. 57, 209-232. doi:10.1016/j.specom.2013.09.008
- Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, J., et al. (2019). Welcome to the Tidyverse. J. Open Source Softw. 4 (43), 1686. doi:10.21105/joss.01686
- Yehia, H., Rubin, P., and Vatikiotis-Bateson, E. (1998). Quantitative Association of Vocal-Tract and Facial Behavior. Speech Commun. 26, 23-43. doi:10.1016/ s0167-6393(98)00048-x
- Yuan, J., and Liberman, M. (2008). Speaker Identification on the SCOTUS Corpus. Proc. Acoust. '08,6 -9. doi:10.1121/1.2935783
- Zellers, M., Gorisch, J., House, D., and Peters, B. (2019). "Hand Gestures and Pitch Contours and Their Distribution at Possible Speaker Change Locations : a First Investigation," in Gesture and Speech in Interaction. 6th edition (Paderborn: GESPIN), 93-98.