首页> 外文期刊>Multimedia Systems >Multimodal shared features learning for emotion recognition by enhanced sparse local discriminative canonical correlation analysis
【24h】

Multimodal shared features learning for emotion recognition by enhanced sparse local discriminative canonical correlation analysis

机译:通过增强的稀疏局部判别典范相关分析进行情感识别的多峰共享特征学习

获取原文
获取原文并翻译 | 示例
       

摘要

Multimodal emotion recognition is a challenging research topic which has recently started to attract the attention of the research community. To better recognize the video users' emotion, the research of multimodal emotion recognition based on audio and video is essential. Multimodal emotion recognition performance heavily depends on finding good shared feature representation. The good shared representation needs to consider two aspects: (1) it has the character of each modality and (2) it can balance the effect of different modalities to make the decision optimal. In the light of these, we propose a novel Enhanced Sparse Local Discriminative Canonical Correlation Analysis approach (En-SLDCCA) to learn the multimodal shared feature representation. The shared feature representation learning involves two stages. In the first stage, we pretrain the Sparse Auto-Encoder with unimodal video (or audio), so that we can obtain the hidden feature representation of video and audio separately. In the second stage, we obtain the correlation coefficients of video and audio using our En-SLDCCA approach, then we form the shared feature representation which fuses the features from video and audio using the correlation coefficients. We evaluate the performance of our method on the challenging multimodal Enterface'05 database. Experimental results reveal that our method is superior to the unimodal video (or audio) and improves significantly the performance for multimodal emotion recognition when compared with the current state of the art.
机译:多模式情感识别是一个具有挑战性的研究主题,最近已开始引起研究界的关注。为了更好地识别视频用户的情感,基于音频和视频的多模式情感识别研究至关重要。多模式情感识别性能在很大程度上取决于找到良好的共享特征表示。良好的共享表示需要考虑两个方面:(1)它具有每种模式的特征;(2)它可以平衡不同模式的影响以使决策达到最佳。鉴于这些,我们提出了一种新颖的增强型稀疏局部判别典型相关分析方法(En-SLDCCA)来学习多峰共享特征表示。共享特征表示学习涉及两个阶段。在第一阶段,我们使用单模态视频(或音频)对稀疏自动编码器进行预训练,以便我们可以分别获取视频和音频的隐藏特征表示。在第二阶段,我们使用En-SLDCCA方法获得视频和音频的相关系数,然后形成共享特征表示,使用相关系数融合视频和音频中的特征。我们在具有挑战性的多模式Enterface'05数据库上评估了我们方法的性能。实验结果表明,与现有技术相比,我们的方法优于单模态视频(或音频),并且显着提高了多模态情感识别的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号