...
首页> 外文期刊>Journal on multimodal user interfaces >An audio-visual dataset of human-human interactions in stressful situations
【24h】

An audio-visual dataset of human-human interactions in stressful situations

机译:压力情况下人与人互动的视听数据集

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Stressful situations are likely to occur at human operated service desks, as well as at human-computer interfaces used in public domain. Automatic surveillance can help notifying when extra assistance is needed. Human communication is inherently multimodal e.g. speech, gestures, facial expressions. It is expected that automatic surveillance systems can benefit from exploiting multimodal information. This requires automatic fusion of modalities, which is still an unsolved problem. To support the development of such systems, we present and analyze audio-visual recordings of human-human interactions at a service desk. The corpus has a high degree of realism: all interactions are freely improvised by actors based on short scenarios where only the sources of conflict were provided. The recordings can be considered as a prototype for general stressful human-human interaction. The recordings were annotated on a 5 point scale on degree of stress from the perspective of surveillance operators. The recordings are very rich in hand gestures. We find that the more stressful the situation, the higher the proportion of speech that is accompanied by gestures. Understanding the function of gestures and their relation to speech is essential for good fusion strategies. Taking speech as the basic modality, one of our research questions was, what is the role of gestures in addition to speech. Both speech and gestures can express emotion, so we say that they have an emotional function. They can also express non-emotional information, in which case we say that they have a semantic function. We learn that when speech and gestures have the same function, they are usually congruent, but intensities and clarity can vary. Most gestures in this dataset convey emotion. We identify classes of gestures in our recordings, and argue that some classes are clear indications of stressful situations.
机译:在人工服务台以及公共领域中使用的人机界面上,可能会出现压力大的情况。自动监视可以帮助通知何时需要额外的帮助。人际交流本质上是多模式的,例如语音,手势,面部表情。期望自动监视系统可以从利用多模式信息中受益。这就需要模态的自动融合,这仍然是一个尚未解决的问题。为了支持此类系统的开发,我们在服务台展示并分析了人与人互动的视听记录。语料库具有高度的真实感:所有交互都由参与者根据仅提供冲突源的简短场景自由地即兴进行。录音可以被认为是一般性的人与人互动的原型。从监视操作员的角度来看,记录以5分制的压力程度注释。录音中的手势非常丰富。我们发现,情况越紧张,伴随手势的讲话比例就越高。了解手势的功能及其与语音的关系对于良好的融合策略至关重要。以语音为基本形态,我们的研究问题之一是手势除了语音以外还起什么作用。语音和手势都可以表达情感,因此我们说它们具有情感功能。它们还可以表达非情感信息,在这种情况下,我们说它们具有语义功能。我们了解到,当语音和手势具有相同的功能时,它们通常是一致的,但是强度和清晰度可能会有所不同。此数据集中的大多数手势都传达情感。我们在录音中确定手势的类别,并认为某些类别清楚地表明了压力状态。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号