...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Speech Emotion Classification Using Attention-Based LSTM
【24h】

Speech Emotion Classification Using Attention-Based LSTM

机译:使用基于注意力的LSTM进行语音情感分类

获取原文
获取原文并翻译 | 示例

摘要

Automatic speech emotion recognition has been a research hotspot in the field of human-computer interaction over the past decade. However, due to the lack of research on the inherent temporal relationship of the speech waveform, the current recognition accuracy needs improvement. To make full use of the difference of emotional saturation between time frames, a novel method is proposed for speech recognition using frame-level speech features combined with attention-based long short-term memory (LSTM) recurrent neural networks. Frame-level speech features were extracted from waveform to replace traditional statistical features, which could preserve the timing relations in the original speech through the sequence of frames. To distinguish emotional saturation in different frames, two improvement strategies are proposed for LSTM based on the attention mechanism: first, the algorithm reduces the computational complexity by modifying the forgetting gate of traditional LSTM without sacrificing performance and second, in the final output of the LSTM, an attention mechanism is applied to both the time and the feature dimension to obtain the information related to the task, rather than using the output from the last iteration of the traditional algorithm. Extensive experiments on the CASIA, eNTERFACE, and GEMEP emotion corpora demonstrate that the performance of the proposed approach is able to outperform the state-of-the-art algorithms reported to date.
机译:在过去的十年中,自动语音情感识别一直是人机交互领域的研究热点。然而,由于缺乏对语音波形的固有时间关系的研究,因此需要提高当前的识别精度。为了充分利用时间帧之间的情感饱和度差异,提出了一种新的语音识别方法,该方法利用帧级语音特征结合基于注意力的长短期记忆(LSTM)递归神经网络进行语音识别。从波形中提取帧级语音特征以代替传统的统计特征,该特征可以通过帧序列保留原始语音中的时序关系。为了区分不同帧中的情绪饱和度,基于注意力机制针对LSTM提出了两种改进策略:首先,该算法通过在不牺牲性能的情况下修改传统LSTM的遗忘门来降低计算复杂度;其次,在LSTM的最终输出中,将注意力机制应用于时间和特征维度,以获得与任务相关的信息,而不是使用传统算法最后一次迭代的输出。在CASIA,eNTERFACE和GEMEP情感语料库上的大量实验表明,所提出的方法的性能能够胜过迄今为止报道的最新算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号