MLT-DNet: Speech emotion recognition using ID dilated CNN based on multi-learning trick approach

Mustaqeem; Kwon Soonil

首页> 外文期刊>Expert systems with applications >MLT-DNet: Speech emotion recognition using ID dilated CNN based on multi-learning trick approach

【24h】

MLT-DNet: Speech emotion recognition using ID dilated CNN based on multi-learning trick approach

机译：MLT-DNET：使用基于多学习技巧方法的ID扩张CNN的语音情感识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speech is the most dominant source of communication among humans, and it is an efficient way for human-computer interaction (HCI) to exchange information. Nowadays, speech emotion recognition (SER) is an active research area that plays a crucial role in real-time applications. In this era, the SER system has lacked real-time speech processing. To address this problem, we propose an end-to-end real-time SER model that is based on a one-dimensional dilated convolutional neural network (DCNN). Our model used a multi-learning strategy to parallel extract spatial salient emotional features and learn long term contextual dependencies from the speech signals. We used residual blocks with a skip connection (RBSC) module-, in order to find a correlation, the emotional cues, and the sequence learning (Seq_L) module, to learn the long term contextual dependencies in the input features. Furthermore, we used a fusion layer to concatenate these learned features for the final emotion recognition task. Our model structure is quite simple, and it is capable of automatically learning salient discriminative features from the speech signals. We evaluated our model using benchmark IEMOCAP and EMO-DB datasets and obtained a high recognition accuracy, which were 73% and 90%, respectively. The experimental results indicated the significance and the efficiency of our proposed model have shown excessive assistance with the implementation of a real-time SER system. Hence, our model is capable of processing original speech signals for the emotion recognition that utilizes lightweight dilated CNN architecture that implements the multi-learning trick (MLT) approach.

机译：演讲是人类之间最大的沟通来源，它是人机交互（HCI）交换信息的有效方法。如今，语音情感认可（SER）是一个活跃的研究区，在实时应用中起着至关重要的作用。在本时，SER系统缺乏实时语音处理。为了解决这个问题，我们提出了一个基于一维扩张的卷积神经网络（DCNN）的端到端实时SER模型。我们的模型使用多学习策略来并行提取空间突出情绪特征，并从语音信号中学习长期上下文依赖关系。我们使用跳过连接（RBSC）模块的剩余块 - 以查找相关性，情绪线索和序列学习（SEQ_L）模块，以了解输入特征中的长期上下文依赖关系。此外，我们使用融合层来连接最终情感识别任务的这些学习功能。我们的模型结构非常简单，它能够自动从语音信号学习阳光辨别特征。我们使用基准IEMocap和EMO-DB数据集进行了评估了我们的模型，并获得了高识别准确度，分别为73％和90％。实验结果表明，我们提出的模型的意义和效率显示出对实时SER系统的实施过度援助。因此，我们的模型能够处理原始语音信号，用于使用实现多学习技巧（MLT）方法的轻质扩张的CNN架构的情感识别。

著录项

来源
《Expert systems with applications》 |2021年第4期|114177.1-114177.12|共12页
作者
Mustaqeem; Kwon Soonil;
展开▼
作者单位

Sejong Univ Dept Software Interact Technol Lab Seoul 05006 South Korea;

Sejong Univ Dept Software Interact Technol Lab Seoul 05006 South Korea;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Affective computing; Dilated convolutional neural network; Real-time speech emotion recognition; Parallel learning; Multi-learning trick (MLT); And raw audio clips;

机译：情感计算;扩张卷积神经网络;实时语音情感识别;并行学习;多学习技巧（MLT）;和原始音频剪辑;

相似文献

外文文献
中文文献
专利

1. Attention guided 3D CNN-LSTM model for accurate speech based emotion recognition [J] . Atila Orhan, Sengur Abdulkadir Applied Acoustics . 2021,第Nova期

机译：关注引导3D CNN-LSTM模型，用于基于准确的语音情感识别
2. Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels [J] . Santiago-OmarCaballero-Morales ScientificWorldJournal . 2013,第3期

机译：墨西哥西班牙语演讲的情绪：一种基于情感特定元音声学建模的方法
3. A radial base neural network approach for emotion recognition in human speech [J] . Lal Hussain, Imran Shafi, Sharjil Saeed, International journal of computer science and network security . 2017,第8期

机译：基于径向基神经网络的人类语音情感识别方法
4. Emotion Recognition from Varying Length Patterns of Speech using CNN-based Segment-Level Pyramid Match Kernel based SVMs [C] . Shikha Gupta, Kishalaya De, Dileep Aroor Dinesh, National Conference on Communications . 2019

机译：使用基于CNN的段级金字塔匹配内核的SVM从不同长度的语音模式进行情绪识别
5. Domain Adaptation for Speech Based Emotion Recognition [D] . Abdelwahab, Mohammed. 2019

机译：基于语音情感识别的域适应
6. Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features [O] . Tursunov Anvarjon, Mustaqeem, Soonil Kwon 2020

机译：深网络：使用深频特征的基于轻量级CNN的语音情感识别系统
7. Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features [O] . Tursunov Anvarjon, Soonil Kwon 2020

机译：深网络：使用深频特征的基于轻量级CNN的语音情感识别系统

MLT-DNet: Speech emotion recognition using ID dilated CNN based on multi-learning trick approach

摘要

著录项

相似文献

相关主题

期刊订阅