首页> 外文期刊>Expert systems with applications >MLT-DNet: Speech emotion recognition using ID dilated CNN based on multi-learning trick approach
【24h】

MLT-DNet: Speech emotion recognition using ID dilated CNN based on multi-learning trick approach

机译:MLT-DNET:使用基于多学习技巧方法的ID扩张CNN的语音情感识别

获取原文
获取原文并翻译 | 示例
           

摘要

Speech is the most dominant source of communication among humans, and it is an efficient way for human-computer interaction (HCI) to exchange information. Nowadays, speech emotion recognition (SER) is an active research area that plays a crucial role in real-time applications. In this era, the SER system has lacked real-time speech processing. To address this problem, we propose an end-to-end real-time SER model that is based on a one-dimensional dilated convolutional neural network (DCNN). Our model used a multi-learning strategy to parallel extract spatial salient emotional features and learn long term contextual dependencies from the speech signals. We used residual blocks with a skip connection (RBSC) module-, in order to find a correlation, the emotional cues, and the sequence learning (Seq_L) module, to learn the long term contextual dependencies in the input features. Furthermore, we used a fusion layer to concatenate these learned features for the final emotion recognition task. Our model structure is quite simple, and it is capable of automatically learning salient discriminative features from the speech signals. We evaluated our model using benchmark IEMOCAP and EMO-DB datasets and obtained a high recognition accuracy, which were 73% and 90%, respectively. The experimental results indicated the significance and the efficiency of our proposed model have shown excessive assistance with the implementation of a real-time SER system. Hence, our model is capable of processing original speech signals for the emotion recognition that utilizes lightweight dilated CNN architecture that implements the multi-learning trick (MLT) approach.
机译:演讲是人类之间最大的沟通来源,它是人机交互(HCI)交换信息的有效方法。如今,语音情感认可(SER)是一个活跃的研究区,在实时应用中起着至关重要的作用。在本时,SER系统缺乏实时语音处理。为了解决这个问题,我们提出了一个基于一维扩张的卷积神经网络(DCNN)的端到端实时SER模型。我们的模型使用多学习策略来并行提取空间突出情绪特征,并从语音信号中学习长期上下文依赖关系。我们使用跳过连接(RBSC)模块的剩余块 - 以查找相关性,情绪线索和序列学习(SEQ_L)模块,以了解输入特征中的长期上下文依赖关系。此外,我们使用融合层来连接最终情感识别任务的这些学习功能。我们的模型结构非常简单,它能够自动从语音信号学习阳光辨别特征。我们使用基准IEMocap和EMO-DB数据集进行了评估了我们的模型,并获得了高识别准确度,分别为73%和90%。实验结果表明,我们提出的模型的意义和效率显示出对实时SER系统的实施过度援助。因此,我们的模型能够处理原始语音信号,用于使用实现多学习技巧(MLT)方法的轻质扩张的CNN架构的情感识别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号