首页> 外文期刊>IEICE Transactions on fundamentals of electronics, communications & computer sciences >A Multitask Learning Approach Based on Cascaded Attention Network And Self-Adaption Loss for Speech Emotion Recognition
【24h】

A Multitask Learning Approach Based on Cascaded Attention Network And Self-Adaption Loss for Speech Emotion Recognition

机译:一种基于级联注意力网络和自适应损失的语音情感识别多任务学习方法

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Speech emotion recognition (SER) has been a complexand difficult task for a long time due to emotional complexity. In this paper,we propose a multitask deep learning approach based on cascaded attentionnetwork and self-adaption loss for SER. Frist, non-personalized featuresare extracted to represent the process of emotion change while reducingexternal variables’ influence. Second, to highlight salient speech emotionfeatures, a cascade attention network is proposed, where spatial temporalattention can effectively locate the regions of speech that express emotion,while self-attention reduces the dependence on external information.Finally, the influence brought by the differences in gender and human perceptionof external information is alleviated by using a multitask learningstrategy, where a self-adaption loss is introduced to determine the weightsof different tasks dynamically. Experimental results on IEMOCAP datasetdemonstrate that our method gains an absolute improvement of 1.97 and0.91 over state-of-the-art strategies in terms of weighted accuracy (WA)and unweighted accuracy (UA), respectively.
机译:长期以来,由于情绪的复杂性,语音情感识别(SER)一直是一项复杂而艰巨的任务。在本文中,我们提出了一种基于级联注意力网络和自适应损失的SER多任务深度学习方法,首先提取非个性化特征来表示情绪变化的过程,同时减少外部变量的影响。其次,为了突出突出言语情感特征,提出了一种级联注意力网络,其中时空注意力可以有效地定位表达情感的言语区域,而自我注意力可以减少对外部信息的依赖。最后,通过使用多任务学习策略,引入自我适应损失来动态确定不同任务的权重,从而缓解性别差异和人类对外部信息感知的影响。在IEMOCAP数据集上的实验结果表明,与现有策略相比,该方法在加权精度(WA)和未加权精度(UA)方面分别获得了1.97%和0.91%的绝对改进。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号