Temporal Bayesian Fusion for Affect Sensing: Combining Video, Audio, and Lexical Modalities

Savran Arman; Cao Houwei; Nenkova Ani; Verma Ragini

首页> 外文期刊>Cybernetics, IEEE Transactions on >Temporal Bayesian Fusion for Affect Sensing: Combining Video, Audio, and Lexical Modalities

【24h】

Temporal Bayesian Fusion for Affect Sensing: Combining Video, Audio, and Lexical Modalities

机译：用于情感感知的时间贝叶斯融合：视频，音频和词汇形式的组合

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The affective state of people changes in the course of conversations and these changes are expressed externally in a variety of channels, including facial expressions, voice, and spoken words. Recent advances in automatic sensing of affect, through cues in individual modalities, have been remarkable; yet emotion recognition is far from a solved problem. Recently, researchers have turned their attention to the problem of multimodal affect sensing in the hope that combining different information sources would provide great improvements. However, reported results fall short of the expectations, indicating only modest benefits and occasionally even degradation in performance. We develop temporal Bayesian fusion for continuous real-value estimation of valence, arousal, power, and expectancy dimensions of affect by combining video, audio, and lexical modalities. Our approach provides substantial gains in recognition performance compared to previous work. This is achieved by the use of a powerful temporal prediction model as prior in Bayesian fusion as well as by incorporating uncertainties about the unimodal predictions. The temporal prediction model makes use of time correlations on the affect sequences and employs estimated temporal biases to control the affect estimations at the beginning of conversations. In contrast to other recent methods for combination of modalities our model is simpler, since it does not model relationships between modalities and involves only a few interpretable parameters to be estimated from the training data.

机译：人们的情感状态会在对话过程中发生变化，并且这些变化会通过各种方式从外部表达出来，包括面部表情，语音和口头表达。通过个体模式的线索，在自动感测情感方面的最新进展非常显着。然而情感识别远非解决问题。最近，研究人员将注意力转移到多模式影响感测问题上，希望将不同的信息源结合起来会带来很大的改善。但是，报告的结果不及预期，表明只有适度的收益，有时甚至会降低性能。我们通过结合视频，音频和词汇形式，开发了时间贝叶斯融合技术，用于对价，唤醒，力量和情感的预期维度进行连续实值估计。与以前的工作相比，我们的方法在识别性能上有实质性的提高。这是通过使用贝叶斯融合中现有的功能强大的时间预测模型以及合并有关单峰预测的不确定性来实现的。时间预测模型利用了影响序列上的时间相关性，并在会话开始时采用估计的时间偏差来控制影响估计。与其他最近的模态组合方法相比，我们的模型更简单，因为它不对模态之间的关系建模，并且仅涉及从训练数据中估算的一些可解释参数。

著录项

来源
《Cybernetics, IEEE Transactions on》 |2015年第9期|1927-1941|共15页
作者
Savran Arman; Cao Houwei; Nenkova Ani; Verma Ragini;
展开▼
作者单位

Department of Radiology, University of Pennsylvania, Philadelphia, PA, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Acoustic; Bayesian fusion; affective computing; arousal; emotion recognition; facial expressions; lexical; multimodal; particle filter; power; speech; temporal fusion; turn-based; valence;

机译：声学;贝叶斯融合;情感计算;声音;情感识别;面部表情;词汇;多峰;粒子滤波;功率;语音;时间融合;基于回合;价;

相似文献

外文文献
中文文献
专利

1. Robust Joint Audio-Video Talker Localization in Video Conferencing Using Reliability Information--II: Bayesian Network Fusion [J] . David Lo, Rafik A. Goubran, Richard M. Dansereau IEEE Transactions on Instrumentation and Measurement . 2005,第4期

机译：使用可靠性信息的视频会议中的鲁棒联合音频视频讲话者本地化--II：贝叶斯网络融合
2. Acoustic event detection based on feature-level fusion of audio and video modalities [J] . Butko T., Canton-Ferrer C., Segura C., EURASIP journal on advances in signal processing . 2011,第20aPta1期

机译：基于音频和视频模态的特征级融合的声音事件检测
3. Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities [J] . Taras Butko, Cristian Canton-Ferrer, Carlos Segura, EURASIP journal on advances in signal processing . 2011,第1期

机译：基于音视频形态特征水平融合的声事件检测
4. Combining Video, Audio and Lexical Indicators of Affect in Spontaneous Conversation via Particle Filtering [C] . Arman Savran, Houwei Cao, Miraj Shah, ACM international conference on multimodal interaction . 2012

机译：通过粒子过滤将视频，音频和词汇指示符组合在自发会话中的效果
5. A Bayesian Framework for Spatio-temporal Fusion of Remotely Sensed Images [D] . Xue, Jie 2017

机译：遥感图像时空融合的贝叶斯框架
6. Combining Video Audio and Lexical Indicators of Affect in Spontaneous Conversation via Particle Filtering [O] . Arman Savran, Houwei Cao, Miraj Shah, -1

机译：通过粒子滤波结合在自发对话中的视频音频和词汇指示器
7. Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering [O] . Arman Savran, Houwei Cao, Miraj Shah, 2012

机译：通过粒子滤波结合在自发对话中的视频，音频和词汇指示器

Temporal Bayesian Fusion for Affect Sensing: Combining Video, Audio, and Lexical Modalities

摘要

著录项

相似文献

相关主题

期刊订阅