首页> 外文期刊>International journal of swarm intelligence >Deep bi-directional LSTM network with CNN features for human emotion recognition in audio-video signals
【24h】

Deep bi-directional LSTM network with CNN features for human emotion recognition in audio-video signals

机译:Deep bi-directional LSTM network with CNN features for human emotion recognition in audio-video signals

获取原文
获取原文并翻译 | 示例
           

摘要

The human emotion detection in audio-video signals is a challenging task. This paper proposed deep bi-directional long short-term memory (Bi-LSTM) network with convolution neural network (CNN) features-based human emotion detection method. First, it utilises the transfer learning Inception-ResNet V2 model to extract the CNN features from audio and video modalities. Furthermore, the frame-wise CNN features sequential information is learned by two separate Bi-LSTM models for audio and video channels, respectively. The weighted product rule-based decision level fusion method computes the final confidence scores with the output probabilities of two independent Bi-LSTM models. The proposed approach is validated, tested, and compared with existing deep learning-based audio-video emotion detection methods on the challenging Ryerson audio-visual database of emotional speech and song (RAVDESS). The experimental results show that the proposed approach has outperformed the existing methods. It has attained 81.03% validation and 83.98% testing emotion detection accuracy on RAVDESS dataset.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号