Human Language Technologies as a Challenge for Computer Science and Linguistics, 2023
Contemporary studies on interpersonal communication confirm that in order to understand and model... more Contemporary studies on interpersonal communication confirm that in order to understand and model this multifaceted process, not only speech itself but also other components, including gestures, facial expressions, or body postures, must be taken into account. In our contribution, we present a corpus developed to support and facilitate this research approach. The corpus comprises three major sections, representing monologues, dialogues and multilogues. While monologue and multilogue sections are based on materials available in public services and archives, the dialogue section contains task-oriented dialogues recorded specifically for the present resource. The speakers are young till middle-aged Polish adults. Speech is transcribed orthographically and phonemically, segmented into words, syllables and phones. Body movement annotation varies among the sections of the corpus. Along with the dialogue recordings, synchronised depth-sensor data are available. In the monologue and multilogue subcorpora, manual annotation of selected categories of gestures is available. The resource will fill a significant gap in the body of Polish corpora and it will hopefully encourage studies of multimodal communication from a number of perspectives. Potential applications of the corpus include education, media and industry. The corpus will be deposited and made freely available for research purposes in the CLARIN-PL infrastructure upon project completion.
Keywords: multimodal corpus, multimodal communication, Polish, persuasive speech, sports language
Uploads
Papers by Janusz Taborek
Keywords: multimodal corpus, multimodal communication, Polish, persuasive speech, sports language