...
首页> 外文期刊>Affective Computing, IEEE Transactions on >CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset
【24h】

CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset

机译:CREMA-D:来自人群的情感多式联运演员数据集

获取原文
获取原文并翻译 | 示例
           

摘要

People convey their emotional state in their face and voice. We present an audio-visual dataset uniquely suited for the study of multi-modal emotion expression and perception. The dataset consists of facial and vocal emotional expressions in sentences spoken in a range of basic emotional states (happy, sad, anger, fear, disgust, and neutral). 7,442 clips of 91 actors with diverse ethnic backgrounds were rated by multiple raters in three modalities: audio, visual, and audio-visual. Categorical emotion labels and real-value intensity values for the perceived emotion were collected using crowd-sourcing from 2,443 raters. The human recognition of intended emotion for the audio-only, visual-only, and audio-visual data are 40.9, 58.2 and 63.6 percent respectively. Recognition rates are highest for neutral, followed by happy, anger, disgust, fear, and sad. Average intensity levels of emotion are rated highest for visual-only perception. The accurate recognition of disgust and fear requires simultaneous audio-visual cues, while anger and happiness can be well recognized based on evidence from a single modality. The large dataset we introduce can be used to probe other questions concerning the audio-visual perception of emotion.
机译:人们通过面部和声音传达情绪状态。我们提出了一个独特的视听数据集,适合用于多模式情感表达和感知的研究。该数据集由在一系列基本情绪状态(快乐,悲伤,愤怒,恐惧,厌恶和中立)中说出的句子中的面部和语音情绪表达组成。多个评估者以三种方式对音频,视觉和视听方式对91名具有不同种族背景的演员的7,442个片段进行了评估。类别情感标签和感知到的情感的实际值强度值是使用来自2443位评估者的众包来收集的。人类对于纯音频,纯视觉和视听数据的预期情感的识别率分别为40.9%,58.2%和63.6%。中立的识别率最高,其次是快乐,愤怒,厌恶,恐惧和悲伤。对于仅视觉感知,平均情感强度水平最高。厌恶和恐惧的准确识别需要同时出现的视听线索,而愤怒和幸福可以根据单一形式的证据得到很好的识别。我们介绍的大型数据集可用于探究有关情感的视听感知的其他问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号