Speech Emotion Classification Using Attention-Based LSTM

Xie Yue; Liang Ruiyu; Liang Zhenlin; Huang Chengwei; Zou Cairong; Schuller Bjoern

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Speech Emotion Classification Using Attention-Based LSTM

【24h】

Speech Emotion Classification Using Attention-Based LSTM

机译：使用基于注意力的LSTM进行语音情感分类

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic speech emotion recognition has been a research hotspot in the field of human-computer interaction over the past decade. However, due to the lack of research on the inherent temporal relationship of the speech waveform, the current recognition accuracy needs improvement. To make full use of the difference of emotional saturation between time frames, a novel method is proposed for speech recognition using frame-level speech features combined with attention-based long short-term memory (LSTM) recurrent neural networks. Frame-level speech features were extracted from waveform to replace traditional statistical features, which could preserve the timing relations in the original speech through the sequence of frames. To distinguish emotional saturation in different frames, two improvement strategies are proposed for LSTM based on the attention mechanism: first, the algorithm reduces the computational complexity by modifying the forgetting gate of traditional LSTM without sacrificing performance and second, in the final output of the LSTM, an attention mechanism is applied to both the time and the feature dimension to obtain the information related to the task, rather than using the output from the last iteration of the traditional algorithm. Extensive experiments on the CASIA, eNTERFACE, and GEMEP emotion corpora demonstrate that the performance of the proposed approach is able to outperform the state-of-the-art algorithms reported to date.

机译：在过去的十年中，自动语音情感识别一直是人机交互领域的研究热点。然而，由于缺乏对语音波形的固有时间关系的研究，因此需要提高当前的识别精度。为了充分利用时间帧之间的情感饱和度差异，提出了一种新的语音识别方法，该方法利用帧级语音特征结合基于注意力的长短期记忆（LSTM）递归神经网络进行语音识别。从波形中提取帧级语音特征以代替传统的统计特征，该特征可以通过帧序列保留原始语音中的时序关系。为了区分不同帧中的情绪饱和度，基于注意力机制针对LSTM提出了两种改进策略：首先，该算法通过在不牺牲性能的情况下修改传统LSTM的遗忘门来降低计算复杂度；其次，在LSTM的最终输出中，将注意力机制应用于时间和特征维度，以获得与任务相关的信息，而不是使用传统算法最后一次迭代的输出。在CASIA，eNTERFACE和GEMEP情感语料库上的大量实验表明，所提出的方法的性能能够胜过迄今为止报道的最新算法。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on 》 |2019年第11期| 1675-1685| 共11页
作者
Xie Yue; Liang Ruiyu; Liang Zhenlin; Huang Chengwei; Zou Cairong; Schuller Bjoern;
展开▼
作者单位

Southeast Univ Sch Informat Sci & Engn Nanjing 210096 Jiangsu Peoples R China;

Southeast Univ Sch Informat Sci & Engn Nanjing 210096 Jiangsu Peoples R China|Nanjing Inst Technol Sch Commun Engn Nanjing 211167 Jiangsu Peoples R China;

Chinese Acad Sci Co Ltd Nanjing 211106 Jiangsu Peoples R China;

Univ Augsburg Chair Embedded Intelligence Hlth Care & Wellbeing D-86159 Augsburg Germany|Imperial Coll London Grp Language Audio & Mus London SW7 2AZ England;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Speech emotion; frame-level features; LSTM; attention mechanism;

机译：言语情感;框架级功能;LSTM;注意机制;

相似文献

外文文献
中文文献
专利

1. A novel dual attention-based BLSTM with hybrid features in speech emotion recognition [J] . Qiupu Chen, Guimin Huang Engineering Applications of Artificial Intelligence . 2021 ,第Juna期

机译：一种基于新的双重关注的BLSTM，语音情感识别中的混合特征
2. Attention-Based Dense LSTM for Speech Emotion Recognition [J] . Yue XIE, Ruiyu LIANG, Zhenlin LIANG, IEICE transactions on information and systems . 2019 ,第7期

机译：基于注意的密集LSTM用于语音情感识别
3. Correction to: Attention-based multimodal contextual fusion for sentiment and emotion classification using bidirectional LSTM [J] . Huddar Mahesh G., Sannakki Sanjeev S., Rajpurohit Vijay S. Multimedia Tools and Applications . 2021 ,第9期

机译：校正：使用双向LSTM的情感和情感分类的关注多峰语境融合
4. Attention-Based BiLSTM Network with Lexical Feature for Emotion Classification [C] . Kai Gao, Hua Xu, Chengliang Gao, International Joint Conference on Neural Networks . 2018

机译：具有词法特征的基于注意力的BiLSTM网络用于情感分类
5. Learning, Classification and Prediction of Maneuvers of Surround Vehicles at Intersections using LSTMs [D] . Khosroshahi, Aida. 2017

机译：使用LSTMS在交叉路口中的环绕式车辆运动的学习，分类和预测
6. Academic Emotion Classification and Recognition Method for Large-scale Online Learning Environment—Based on A-CNN and LSTM-ATT Deep Learning Pipeline Method [O] . Xiang Feng, Yaojia Wei, Xianglin Pan, 2020

机译：大规模在线学习环境的学术情感分类与识别方法-基于A-CNN和LSTM-ATT深度学习流水线方法
7. Attention-Based Dense LSTM for Speech Emotion Recognition [O] . Yue XIE, Ruiyu LIANG, Zhenlin LIANG, 2019

机译：基于注意力的语音情感识别的密集LSTM

Speech Emotion Classification Using Attention-Based LSTM

摘要

著录项

相似文献

相关主题

期刊订阅