首页> 外文期刊>Journal of signal processing systems for signal, image, and video technology >Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
【24h】

Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning

机译:基于多任务学习的粗致良好的语音情感识别

获取原文
获取原文并翻译 | 示例

摘要

Speech emotion recognition is very challenging because the definition of emotion is uncertain and the feature representation is complex. Accurate feature representation is one of the key factors for successful speech emotion recognition. Studies have shown that 3D data composed of static, deltas and delta-deltas of log-Mel spectrum is very effective in filtering irrelevant features. The challenge of speech emotion recognition is also reflected in the necessity of fine-grained classification. Typical applications of affective computing, such as psychological counseling and emotion regulation, require fine-grained emotion recognition. Based on the two inspirations, this paper proposes an end-to-end hierarchical multi-task learning framework, from coarse to fine to achieve fine-grained emotion recognition. Using 3D data as input, in the first stage, we train the coarse emotion type, and then use the result to assist the second stage training for the fine emotion type. By conducting the comparative experiments on the IEMOCAP corpus, we find that the classification idea of coarse-to-fine has a significant performance improvement over the baseline models.
机译:语音情感识别是非常具有挑战性的,因为情感的定义是不确定的特征表示是复杂的。精确的特征表示是成功的语音情感识别的关键因素之一。有研究表明,静态,三角洲和日志梅尔频谱的Δ-增量由3D数据是在过滤不相关特征是非常有效的。语音情感识别的挑战还体现在细颗粒分级的必要性。情感计算的典型应用,如心理辅导和情绪调节,需要细粒度的情感认同。基于这两个启示,本文提出了一个终端到终端的分层多任务学习框架,从粗到细,实现细粒度的情感认同。使用三维数据作为输入,在第一阶段中,我们培养粗情感类型,然后使用该结果以辅助用于精细情感类型的第二阶段的训练。通过开展对IEMOCAP语料库的对比实验中,我们发现,由粗到细的分类想法有超过基线机型显著的性能提升。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号