首页> 外文会议>International Conference on Automatic Face and Gesture Recognition >Visual Scene-aware Hybrid Neural Network Architecture for Video-based Facial Expression Recognition
【24h】

Visual Scene-aware Hybrid Neural Network Architecture for Video-based Facial Expression Recognition

机译:基于视频的面部表情识别的视觉场景感知混合神经网络架构

获取原文

摘要

With rapid development of deep learning, facial expression recognition (FER) technology has made considerable progress recently. However, since conventional FER techniques are mainly designed and learned for videos which are artificially acquired in a limited environment, they may not operate robustly on videos acquired in a wild environment. To solve this problem, this paper proposes a scene-aware hybrid neural network (NN) having a novel combination of three-dimensional (3D) convolutional NN (CNN), 2D CNN and recurrent NN (RNN). The characteristics of the proposed network are as follows. First, we extract video-based global features and frame-based local features at the same time. In detail, the latent features containing the overall visual scene of a given video are extracted by 3D CNN with auxiliary classifier, and fine-tuned 2D CNN is adopted to extract latent features containing small details from each frame. Second, RNN not only performs temporal domain learning, but also feature-wise fuses two latent features extracted from the networks. For effective fusion, we also present three RNN schemes. Third, the proposed network, in which the above-mentioned methods collaborate, works very robust in a wild environment as well as in a limited environment. Extensive experiments show that the proposed network provides an average accuracy of 49.9% for AFEW dataset, i.e., a representative wild dataset, and an amazing accuracy of 98.2% for another CK+ dataset. We also show that the proposed network outperforms the state-of-the-art network(s).
机译:随着深度学习的快速发展,面部表情识别(FER)技术最近取得了相当大的进展。然而,由于传统的FER技术主要设计和学习了在有限环境中人工地获取的视频,因此它们可能无法稳健地在野外环境中获取的视频中运行。为了解决这个问题,本文提出了一种现场感知的混合神经网络(NN),其具有三维(3D)卷积NN(CNN),2D CNN和反复间NN(RNN)的新组合。所提出的网络的特征如下。首先,我们同时提取基于视频的全局功能和基于帧的本地功能。详细地,包含给定视频的整体视觉场景的潜在特征由带有辅助分类器的3D CNN提取,采用微调的2D CNN来提取包含来自每个帧的小细节的潜在特征。其次,RNN不仅执行时间域学习,而且还具有从网络中提取的两个潜在功能的功能。有效融合,我们还提出了三个RNN计划。第三,所提出的网络,其中上述方法合作,在野外环境以及有限的环境中工作非常强大。广泛的实验表明,所提出的网络为AFEW DataSet,即代表性野生数据集提供的平均精度为49.9%,另一个CK + DataSet的惊人精度为98.2%。我们还表明,所提出的网络优于最先进的网络。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号