首页> 外文会议>ACM international conference on multimodal interaction >Combining Video, Audio and Lexical Indicators of Affect in Spontaneous Conversation via Particle Filtering
【24h】

Combining Video, Audio and Lexical Indicators of Affect in Spontaneous Conversation via Particle Filtering

机译:通过粒子过滤将视频,音频和词汇指示符组合在自发会话中的效果

获取原文

摘要

We present experiments on fusing facial video, audio and lexical indicators for affect estimation during dyadic conversations. We use temporal statistics of texture descriptors extracted from facial video, a combination of various acoustic features, and lexical features to create regression based affect estimators for each modality. The single modality regressors are then combined using particle filtering, by treating these independent regression outputs as measurements of the affect states in a Bayesian filtering framework, where previous observations provide prediction about the current state by means of learned affect dynamics. Tested on the Audio-visual Emotion Recognition Challenge dataset, our single modality estimators achieve substantially higher scores than the official baseline method for every dimension of affect. Our filtering-based multi-modality fusion achieves correlation performance of 0.344 (baseline: 0.136) and 0.280 (baseline: 0.096) for the fully continuous and word level sub challenges, respectively.
机译:我们提出了融合面部视频,音频和词汇指标以进行二元对话期间的影响估计的实验。我们使用从面部视频中提取的纹理描述符的时间统计数据,各种声学特征和词汇特征的组合来为每个模态创建基于回归的影响估计量。然后,通过将这些独立的回归输出作为贝叶斯过滤框架中对情感状态的度量来处理,然后使用粒子滤波将单个模态回归变量进行组合,其中先前的观察通过学习到的情感动态来提供有关当前状态的预测。在视听情感识别挑战数据集上进行测试,我们的单模态估计器在情感的各个维度上的得分均比官方基准线方法高出许多。我们基于过滤的多模态融合分别针对完全连续和字级子挑战分别实现了0.344(基准:0.136)和0.280(基准:0.096)的相关性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号