首页> 外文会议>IEEE International Conference on Image Processing, Applications and Systems >Emotion Recognition on large video dataset based on Convolutional Feature Extractor and Recurrent Neural Network
【24h】

Emotion Recognition on large video dataset based on Convolutional Feature Extractor and Recurrent Neural Network

机译:基于卷积特征提取器和经常性神经网络的大型视频数据集的情感识别

获取原文

摘要

For many years, the emotion recognition task has remained one of the most interesting and important problems in the field of human-computer interaction. In this study, we consider the emotion recognition task as a classification as well as a regression task by processing encoded emotions in different datasets using deep learning models. Our model combines a convolutional neural network (CNN) with recurrent neural network (RNN) to predict dimensional emotions on video data. In the first step, CNN extracts feature vectors from video frames. In the second step, we fed these feature vectors to train RNN for exploiting the temporal dynamics of video. Furthermore, we analyzed how each neural network contributes to the sys-tem's overall performance. The experiments are performed on publicly available datasets including the largest modern Aff-Wild2 database. It contains over sixty hours of video data. We discovered the problem of overfitting of the model on an unbalanced dataset with an illustrative example using confusion matrices. The problem is solved by downsampling technique to balance the dataset. By significantly decreasing training data, we balance the dataset, thereby, the overall performance of the model is improved. Hence, the study qualitatively describes the abilities of deep learning models exploring enough amount of data to predict facial emotions. Our proposed method is implemented using Tensorflow Keras. The code is publicly available in the repository11https://github.com/DenisRang/Combined-CNN-RNN-for-emotion-recognition.
机译:多年来,情感识别任务仍然是人机互动领域最有趣和最重要的问题之一。在这项研究中,我们将情绪识别任务视为通过使用深层学习模型处理不同数据集中的编码情绪的分类以及回归任务。我们的模型将卷积神经网络(CNN)与经常性神经网络(RNN)相结合,以预测视频数据的尺寸情绪。在第一步中,CNN从视频帧中提取特征向量。在第二步中,我们馈送这些特征向量来训练RNN以利用视频的时间动态。此外,我们分析了每个神经网络如何如何促进系统的整体性能。该实验是对公共数据集进行的,包括最大的现代AFF-Wild2数据库。它包含超过六十小时的视频数据。我们发现使用混淆矩阵的说明性示例在不平衡数据集上过度地过度的问题。通过下采样技术解决问题以平衡数据集。通过显着减少训练数据,我们平衡数据集,从而提高了模型的整体性能。因此,该研究定性地描述了深度学习模型的能力,探讨了足够量数据以预测面部情绪。我们所提出的方法是使用Tensorflow Keras实现的。代码在存储库中公开使用 1 1 https://github.com/denisrang/combined-cnn-rnn-for-emotion-recognition。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号