首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Speech Emotion Recognition Using Deep Neural Network Considering Verbal and Nonverbal Speech Sounds
【24h】

Speech Emotion Recognition Using Deep Neural Network Considering Verbal and Nonverbal Speech Sounds

机译:考虑口头和非语言语音声音,使用深神经网络的语音情感识别

获取原文

摘要

Speech emotion recognition is becoming increasingly important for many applications. In real-life communication, non-verbal sounds within an utterance also play an important role for people to recognize emotion. In current studies, only few emotion recognition systems considered nonverbal sounds, such as laughter, cries or other emotion interjection, which naturally exists in our daily conversation. In this work, both verbal and nonverbal sounds within an utterance were thus considered for emotion recognition of real-life conversations. Firstly, an SVM-based verbal/nonverbal sound detector was developed. A Prosodic Phrase (PPh) auto-tagger was further employed to extract the verbal/nonverbal segments. For each segment, the emotion and sound features were respectively extracted based on convolutional neural networks (CNNs) and then concatenated to form a CNN-based generic feature vector. Finally, a sequence of CNN-based feature vectors for an entire dialog turn was fed to an attentive long short-term memory (LSTM)-based sequence-to-sequence model to output an emotional sequence as recognition result. Experimental results on the recognition of seven emotional states in the NNIME (The NTHU-NTUA Chinese interactive multimodal emotion corpus) showed that the proposed method achieved a detection accuracy of 52.00% outperforming the traditional methods.
机译:语音情感识别对于许多应用来说越来越重要。在现实生活中,话语中的非口头声音也为人们认识到情绪发挥着重要作用。在目前的研究中,只有很少的情感识别系统被认为是非语言的声音,例如笑声,哭泣或其他情感互动,这自然存在于我们的日常谈话中。在这项工作中,言语内的口头和非语言的声音都被认为是对现实谈话的情感认可。首先,开发了基于SVM的口头/非语言声探测器。进一步采用韵律短语(PPH)自动标记器来提取口头/非语言段。对于每个段,基于卷积神经网络(CNNS)分别提取情绪和声音特征,然后连接以形成基于CNN的通用特征向量。最后,为整个对话框转向的基于CNN的特征向量序列被馈送到分别的长短期存储器(LSTM)的基于序列到序列模型,以将情绪序列输出为识别结果。实验结果对Nnime中七种情绪状态的识别(Nthu-NTUA中华互动多媒体情绪语料库)表明,所提出的方法达到了52.00%优于传统方法的检测精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号