Speech Emotion Recognition using Convolutional Neural Network with Audio Word-based Embedding

机译：语音情感识别使用卷积神经网络与基于音频基于Word的嵌入

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A complete emotional expression typically contains a complex temporal course in a natural conversation. Related research on utterance-level, segment-level and multi-level processing lacks understanding of the underlying relation of emotional speech. In this work, a convolutional neural network (CNN) with audio word-based embedding is proposed for emotion modeling. In this study, vector quantization is first applied to convert the low level features of each speech frame into audio words using k-means algorithm. Word2vec is adopted to convert an input speech utterance into the corresponding audio word vector sequence. Finally, the audio word vector sequences of the training emotional speech data with emotion annotation are used to construct the CNN- based emotion model. The NCKU-ES database, containing seven emotion categories: happiness, boredom, anger, anxiety, sadness, surprise and disgust, was collected and five-fold cross validation was used to evaluate the performance of the proposed CNN-based method for speech emotion recognition. Experimental results show that the proposed method achieved an emotion recognition accuracy of 82.34%, improving by 8.7% compared to the Long Short Term Memory (LSTM)- based method, which faced the challenging issue of long input sequence. Comparing with raw features, the audio word-based embedding achieved an improvement of 3.4% for speech emotion recognition.

机译：完整的情绪表达通常包含自然对话中的复杂时间课程。相关研究对话语水平，分部水平和多级处理缺乏对情绪言论的基础关系的理解。在这项工作中，提出了一种具有音频基于词的嵌入的卷积神经网络（CNN），用于情绪建模。在该研究中，首先应用向量量化以使用K-Means算法将每个语音帧的低级特征转换为音频单词。采用Word2VEC将输入语音发声转换为相应的音频字矢量序列。最后，使用情感注释的训练情绪语音数据的音频词矢量序列来构建基于CNN的情感模型。 Ncku-es数据库，包含七种情感类别：幸福，无聊，愤怒，焦虑，悲伤，惊喜，厌恶，用来逐步验证来评估拟议的基于CNN的语音情感认可方法的表现。实验结果表明，该方法达到了82.34％的情绪识别准确度，与基于长的短期内存（LSTM）的方法相比，增长了8.7％，面临着长输入序列的具有挑战性问题。比较原始特征，基于音频基于字的嵌入实现了语音情绪识别的提高3.4％。

著录项

来源
《International Symposium on Chinese Spoken Language Processing》|2018年|504p|共5页
会议地点
作者
Kun-Yi Huang; Chung-Hsien Wu; Qian-Bei Hong; Ming-Hsiang Su; Yuan-Rong Zeng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Emotion recognition; Feature extraction; Speech recognition; Acoustics; Training; Databases; Neural networks;

机译：情绪识别;特征提取;语音识别;声学;培训;数据库;神经网络;

相似文献

外文文献
中文文献
专利

1. Continuous Speech Emotion Recognition with Convolutional Neural Networks [J] . NIKOLAOS VRYZAS, LAZAROS VRYSIS, MARIA MATSIOLA, Journal of the Audio Engineering Society . 2020,第1a2期

机译：卷积神经网络的连续语音情感识别
2. Learning Deep Binaural Representations With Deep Convolutional Neural Networks for Spontaneous Speech Emotion Recognition [J] . Zhang Shiqing, Chen Aihua, Guo Wenping, Quality Control, Transactions . 2020,第期

机译：学习深层卷积神经网络的深层双耳陈述，用于自发言论情绪识别
3. Speech emotion recognition with deep convolutional neural networks [J] . Issa Dias, Demirci M. Fatih, Yazici Adnan Biomedical signal processing and control . 2020,第May期

机译：与深卷积神经网络的语音情感识别
4. Speech Emotion Recognition using Convolutional Neural Network with Audio Word-based Embedding [C] . Kun-Yi Huang, Chung-Hsien Wu, Qian-Bei Hong, International Symposium on Chinese Spoken Language Processing . 2018

机译：基于卷积神经网络的音频词嵌入语音情感识别
5. Convolutional Neural Networks for Speaker-Independent Speech Recognition. [D] . Belilovsky, Eugene. 2011

机译：用于与说话人无关的语音识别的卷积神经网络。
6. Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition [O] . Hua Zhang, Ruoyun Gou, Jili Shang, 2021

机译：训练的深度卷积神经网络模型注意语音情感识别
7. Attentive Convolutional Neural Network based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech [O] . Neumann, Michael, Vu, Ngoc Thang 2017

机译：基于卷积神经网络的语音情感识别：输入特征，信号长度和作用语音的影响研究

Speech Emotion Recognition using Convolutional Neural Network with Audio Word-based Embedding

摘要

著录项

相似文献

相关主题

期刊订阅