首页> 外文期刊>Expert systems with applications >Speech emotion recognition using recurrent neural networks with directional self-attention
【24h】

Speech emotion recognition using recurrent neural networks with directional self-attention

机译:使用反复性神经网络具有定向自我关注的语音情感识别

获取原文
获取原文并翻译 | 示例

摘要

As an important branch of affective computing, Speech Emotion Recognition (SER) plays a vital role in human?computer interaction. In order to mine the relevance of signals in audios an increase the diversity of information, Bi-directional Long-Short Term Memory with Directional Self-Attention (BLSTM-DSA) is proposed in this paper. Long Short-Term Memory (LSTM) can learn long-term dependencies from learned local features. Moreover, Bi-directional Long-Short Term Memory (BLSTM) can make the structure more robust by direction mechanism because that the directional analysis can better recognize the hidden emotions in sentence. At the same time, autocorrelation of speech frames can be used to deal with the lack of information, so that SelfAttention mechanism is introduced into SER. The attention weight of each frame is calculated with the output of the forward and backward LSTM respectively rather than calculated after adding them together. Thus, the algorithm can automatically annotate the weights of speech frames to correctly select frames with emotional information in temporal network. When evaluate it on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) database and Berlin database of emotional speech (EMO-DB), the BLSTM-DSA demonstrates satisfactory performance on the task of speech emotion recognition. Especially in emotion recognizing of happiness and anger, BLSTM-DSA achieves the highest recognition accuracies.
机译:作为情感计算的重要分支,语音情感识别(Ser)在人类中发挥着至关重要的作用?计算机互动。为了挖掘Audios中信号的相关性,提出了在本文中提出了具有定向自我关注(BLSTM-DSA)的双向长短短期记忆的多样性。长期内存(LSTM)可以从学习的本地功能学习长期依赖关系。此外,双向长短路记忆(BLSTM)可以使结构通过方向机制更加坚固,因为定向分析可以更好地识别句子中隐藏的情绪。同时,可以使用语音帧的自相关来处理缺乏信息,从而引入了SER自行信机制。每个帧的注意力分别计算出前向和后向LSTM的输出而不是在将它们加到一起之后计算。因此,该算法可以自动注释语音帧的权重,以便在时间网络中正确地选择具有情绪信息的帧。当在交互式情绪二进制运动捕获(IEMocap)数据库和情感语音(EMO-DB)的柏林数据库上评估它时,BLSTM-DSA对语音情感识别的任务展示了令人满意的性能。特别是在识别幸福和愤怒的情感中,Blstm-DSA实现了最高的识别准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号