首页> 外文期刊>IEEE Transactions on Emerging Topics in Computational Intelligence >Learning Class-Aligned and Generalized Domain-Invariant Representations for Speech Emotion Recognition
【24h】

Learning Class-Aligned and Generalized Domain-Invariant Representations for Speech Emotion Recognition

机译:学习类对齐和广义域 - 不变的语音情感识别表示

获取原文
获取原文并翻译 | 示例
           

摘要

Although recent research on speech emotion recognition has demonstrated that learning domain-invariant features provide an elegant solution to domain mismatch, the features learned by the existing methods lack generalization capabilities to capture latent information from datasets. We propose two novel domain adaptation methods, the generalized domain adversarial neural network (GDANN) and the class-aligned GDANN (CGDANN), to learn generalized domain-invariant representations for emotion recognition. GDANN and CGDANN, which are derived from multitask learning (MTL), consist of three tasks. The main task is to recognize the emotional category to which the input belongs. The remaining two tasks are auxiliary tasks. One is to use a variational autoencoder to model the input distribution, which encourages the model to learn the distribution of latent representations. The other is to learn the common representations of different domains, for which distinguishing via the domain classifier is difficult. The gradient of the domain classifier guides the shared representations of the source and target domains to approximate each other using a gradient reversal layer. To evaluate the effectiveness of the proposed methods, we conduct several experiments with the IEMOCAP and MSP-IMPROV datasets. The results illustrate that good performance is achieved compared with that of state-of-the-art methods. Notably, CGDANN utilizes a small quantity of labeled target domain samples to align the distribution representation and obtains the best performance among the comparison methods. We further visualize the representations learned by the proposed methods and discover that the representations of the source and target domains converge with a low variance.
机译:虽然最近关于语音情感识别的研究表明,学习域不变的功能为域不匹配提供了优雅的解决方案,但现有方法学习的功能缺少泛化能力以捕获数据集的潜在信息。我们提出了两种新型域适应方法,广义域对抗性神经网络(GDANN)和类对齐的GDANN(CGDANN),以学习情感识别的广义域 - 不变表示。 GDANN和CGDANN来自多任务学习(MTL),包括三个任务。主要任务是识别输入所属的情绪类别。其余的两个任务是辅助任务。一个是使用变形式AutoEncoder来模拟输入分布,这鼓励模型学习潜在表示的分布。另一个是学习不同域的公共表示,难以实现域分类器的区分。域分类器的梯度指导源域的共享表示和目标域以彼此近似使用梯度反转层。为了评估所提出的方法的有效性,我们将多个实验与IEMocap和MSP-EXPLED数据集进行多次实验。结果说明了与最先进的方法相比实现了良好的性能。值得注意的是,CGDANN利用少量标记的目标域样本来对准分布表示并获得比较方法之间的最佳性能。我们进一步可视化了所提出的方法学到的表示,并发现源和目标域的表示会聚具有低方差。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号