首页> 外文期刊>Pattern recognition letters >Leveraging recent advances in deep learning for audio-Visual emotion recognition
【24h】

Leveraging recent advances in deep learning for audio-Visual emotion recognition

机译:利用最近的视听情感认可深度学习的进步

获取原文
获取原文并翻译 | 示例

摘要

Emotional expressions are the behaviors that communicate our emotional state or attitude to others. They are expressed through verbal and non-verbal communication. Complex human behavior can be understood by studying physical features from multiple modalities; mainly facial, vocal and physical gestures. Recently, spontaneous multi-modal emotion recognition has been extensively studied for human behavior analysis. In this paper, we propose a new deep learning-based approach for audio-visual emotion recognition. Our approach leverages recent advances in deep learning like knowledge distillation and high-performing deep architectures. The deep feature representations of the audio and visual modalities are fused based on a model-level fusion strategy. A recurrent neural network is then used to capture the temporal dynamics. Our proposed approach substantially outperforms state-of-the-art approaches in predicting valence on the RECOLA dataset. Moreover, our proposed visual facial expression feature extraction network outperforms state-of-the-art results on the AffectNet and Google Facial Expression Comparison datasets.& nbsp; (c) 2021 Elsevier B.V. All rights reserved.
机译:情绪表达是将我们的情绪状态或对他人态度传达的行为。他们通过口头和非口头沟通表示。通过研究来自多种方式的物理特征,可以理解复杂的人类行为;主要是面部,声带和物理手势。最近,对人类行为分析进行了广泛研究了自发的多模态情绪识别。在本文中,我们提出了一种新的视听情绪识别的新深度学习方法。我们的方法利用最近的深度学习进步,如知识蒸馏和高性能的深层架构。音频和视觉模式的深度特征表示基于模型级融合策略融合。然后使用经常性神经网络来捕获时间动态。我们所提出的方法在预测Recola数据集上预测价值的最新方法显着优于最先进的方法。此外,我们所提出的视觉面部表情特征提取网络优于EffectNet和Google面部表情比较数据集的最先进结果。  (c)2021 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号