首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Speaker-Invariant Affective Representation Learning via Adversarial Training
【24h】

Speaker-Invariant Affective Representation Learning via Adversarial Training

机译:通过对抗培训学习演讲者 - 不变情感表现

获取原文

摘要

Representation learning for speech emotion recognition is challenging due to labeled data sparsity issue and lack of gold-standard references. In addition, there is much variability from input speech signals, human subjective perception of the signals and emotion label ambiguity. In this paper, we propose a machine learning framework to obtain speech emotion representations by limiting the effect of speaker variability in the speech signals. Specifically we propose to disentangle the speaker characteristics from emotion through an adversarial training network in order to better represent emotion. Our method combines the gradient reversal technique with an entropy loss function to remove such speaker information. Our approach is evaluated on both IEMOCAP and CMU-MOSEI datasets. We show that our method improves speech emotion classification and increases generalization to unseen speakers.
机译:由于标记为数据稀疏问题和缺乏金标参考,表示语音情感认可的代表性学习是挑战。 此外,输入语音信号存在多大可变性,人类主观对信号和情绪标记的歧义。 在本文中,我们提出了一种机器学习框架,通过限制语音信号中的扬声器变异性的影响来获得语音情感表示。 具体而言,我们建议解开通过对抗训练网络的情绪从情绪中解开扬声器特征,以便更好地代表情绪。 我们的方法将梯度反转技术与熵丢失功能相结合以去除此类扬声器信息。 我们的方法是在IEMocap和CMU-MOSEI数据集上进行评估。 我们表明我们的方法改善了语音情绪分类,并增加了看不见的扬声器的泛化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号