首页> 外文期刊>Cybernetics, IEEE Transactions on >Temporal Bayesian Fusion for Affect Sensing: Combining Video, Audio, and Lexical Modalities
【24h】

Temporal Bayesian Fusion for Affect Sensing: Combining Video, Audio, and Lexical Modalities

机译:用于情感感知的时间贝叶斯融合:视频,音频和词汇形式的组合

获取原文
获取原文并翻译 | 示例
           

摘要

The affective state of people changes in the course of conversations and these changes are expressed externally in a variety of channels, including facial expressions, voice, and spoken words. Recent advances in automatic sensing of affect, through cues in individual modalities, have been remarkable; yet emotion recognition is far from a solved problem. Recently, researchers have turned their attention to the problem of multimodal affect sensing in the hope that combining different information sources would provide great improvements. However, reported results fall short of the expectations, indicating only modest benefits and occasionally even degradation in performance. We develop temporal Bayesian fusion for continuous real-value estimation of valence, arousal, power, and expectancy dimensions of affect by combining video, audio, and lexical modalities. Our approach provides substantial gains in recognition performance compared to previous work. This is achieved by the use of a powerful temporal prediction model as prior in Bayesian fusion as well as by incorporating uncertainties about the unimodal predictions. The temporal prediction model makes use of time correlations on the affect sequences and employs estimated temporal biases to control the affect estimations at the beginning of conversations. In contrast to other recent methods for combination of modalities our model is simpler, since it does not model relationships between modalities and involves only a few interpretable parameters to be estimated from the training data.
机译:人们的情感状态会在对话过程中发生变化,并且这些变化会通过各种方式从外部表达出来,包括面部表情,语音和口头表达。通过个体模式的线索,在自动感测情感方面的最新进展非常显着。然而情感识别远非解决问题。最近,研究人员将注意力转移到多模式影响感测问题上,希望将不同的信息源结合起来会带来很大的改善。但是,报告的结果不及预期,表明只有适度的收益,有时甚至会降低性能。我们通过结合视频,音频和词汇形式,开发了时间贝叶斯融合技术,用于对价,唤醒,力量和情感的预期维度进行连续实值估计。与以前的工作相比,我们的方法在识别性能上有实质性的提高。这是通过使用贝叶斯融合中现有的功能强大的时间预测模型以及合并有关单峰预测的不确定性来实现的。时间预测模型利用了影响序列上的时间相关性,并在会话开始时采用估计的时间偏差来控制影响估计。与其他最近的模态组合方法相比,我们的模型更简单,因为它不对模态之间的关系建模,并且仅涉及从训练数据中估算的一些可解释参数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号