首页> 外文期刊>Image and Vision Computing >LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework
【24h】

LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework

机译:LSTM视听影响识别框架中连续情感的建模

获取原文
获取原文并翻译 | 示例

摘要

Automatically recognizing human emotions from spontaneous and non-prototypical real-life data is currently one of the most challenging tasks in the field of affective computing. This article presents our recent advances in assessing dimensional representations of emotion, such as arousal, expectation, power, and valence, in an audiovisual human-computer interaction scenario. Building on previous studies which demonstrate that long-range context modeling tends to increase accuracies of emotion recognition, we propose a fully automatic audiovisual recognition approach based on Long Short-Term Memory (LSTM) modeling of word-level audio and video features. LSTM networks are able to incorporate knowledge about how emotions typically evolve over time so that the inferred emotion estimates are produced under consideration of an optimal amount of context. Extensive evaluations on the Audiovisual Sub-Challenge of the 2011 Audio/Visual Emotion Challenge show how acoustic, linguistic, and visual features contribute to the recognition of different affective dimensions as annotated in the SEMA1NE database. We apply the same acoustic features as used in the challenge baseline system whereas visual features are computed via a novel facial movement feature extractor. Comparing our results with the recognition scores of all Audiovisual Sub-Challenge participants, we find that the proposed LSTM-based technique leads to the best average recognition performance that has been reported for this task so far.
机译:从自发的和非原型的现实生活数据中自动识别人类情绪是当前情感计算领域最具挑战性的任务之一。本文介绍了我们在视听人​​机交互场景中评估情感的维度表示(例如唤醒,期望,力量和效价)方面的最新进展。在以前的研究表明远程上下文建模会增加情感识别的准确性的基础上,我们提出一种基于单词级音频和视频功能的长期短期记忆(LSTM)建模的全自动视听识别方法。 LSTM网络能够整合有关情绪通常如何随着时间演变的知识,以便在考虑最佳上下文量的情况下得出推断的情绪估计。在2011年“视听情感挑战赛”的“视听子挑战”中进行了广泛的评估,结果表明,声音,语言和视觉功能如何有助于识别SEMA1NE数据库中注释的不同情感维度。我们应用与挑战基准系统中使用的声学特征相同的声学特征,而视觉特征是通过新型面部运动特征提取器计算的。将我们的结果与所有“视听次级挑战”参与者的识别分数进行比较,我们发现,基于LSTM的拟议技术带来了迄今为止针对该任务所报告的最佳平均识别性能。

著录项

  • 来源
    《Image and Vision Computing》 |2013年第2期|153-163|共11页
  • 作者单位

    Institute for Human-Machine Communication, Technische Universitaet Muenchen, Theresienstr. 90, 80333 Munchen, Germany;

    Institute for Human-Machine Communication, Technische Universitaet Muenchen, Theresienstr. 90, 80333 Munchen, Germany;

    Institute for Human-Machine Communication, Technische Universitaet Muenchen, Theresienstr. 90, 80333 Munchen, Germany;

    Institute for Human-Machine Communication, Technische Universitaet Muenchen, Theresienstr. 90, 80333 Munchen, Germany;

    Institute for Human-Machine Communication, Technische Universitaet Muenchen, Theresienstr. 90, 80333 Munchen, Germany;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    emotion recognition; long short-term memory; facial movement features; context modeling;

    机译:情绪识别;短期记忆面部运动特征;上下文建模;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号