Learning Class-Aligned and Generalized Domain-Invariant Representations for Speech Emotion Recognition

Yufeng Xiao; Huan Zhao; Tingting Li

首页> 外文期刊>IEEE Transactions on Emerging Topics in Computational Intelligence >Learning Class-Aligned and Generalized Domain-Invariant Representations for Speech Emotion Recognition

【24h】

Learning Class-Aligned and Generalized Domain-Invariant Representations for Speech Emotion Recognition

机译：学习类对齐和广义域 - 不变的语音情感识别表示

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Although recent research on speech emotion recognition has demonstrated that learning domain-invariant features provide an elegant solution to domain mismatch, the features learned by the existing methods lack generalization capabilities to capture latent information from datasets. We propose two novel domain adaptation methods, the generalized domain adversarial neural network (GDANN) and the class-aligned GDANN (CGDANN), to learn generalized domain-invariant representations for emotion recognition. GDANN and CGDANN, which are derived from multitask learning (MTL), consist of three tasks. The main task is to recognize the emotional category to which the input belongs. The remaining two tasks are auxiliary tasks. One is to use a variational autoencoder to model the input distribution, which encourages the model to learn the distribution of latent representations. The other is to learn the common representations of different domains, for which distinguishing via the domain classifier is difficult. The gradient of the domain classifier guides the shared representations of the source and target domains to approximate each other using a gradient reversal layer. To evaluate the effectiveness of the proposed methods, we conduct several experiments with the IEMOCAP and MSP-IMPROV datasets. The results illustrate that good performance is achieved compared with that of state-of-the-art methods. Notably, CGDANN utilizes a small quantity of labeled target domain samples to align the distribution representation and obtains the best performance among the comparison methods. We further visualize the representations learned by the proposed methods and discover that the representations of the source and target domains converge with a low variance.

机译：虽然最近关于语音情感识别的研究表明，学习域不变的功能为域不匹配提供了优雅的解决方案，但现有方法学习的功能缺少泛化能力以捕获数据集的潜在信息。我们提出了两种新型域适应方法，广义域对抗性神经网络（GDANN）和类对齐的GDANN（CGDANN），以学习情感识别的广义域 - 不变表示。 GDANN和CGDANN来自多任务学习（MTL），包括三个任务。主要任务是识别输入所属的情绪类别。其余的两个任务是辅助任务。一个是使用变形式AutoEncoder来模拟输入分布，这鼓励模型学习潜在表示的分布。另一个是学习不同域的公共表示，难以实现域分类器的区分。域分类器的梯度指导源域的共享表示和目标域以彼此近似使用梯度反转层。为了评估所提出的方法的有效性，我们将多个实验与IEMocap和MSP-EXPLED数据集进行多次实验。结果说明了与最先进的方法相比实现了良好的性能。值得注意的是，CGDANN利用少量标记的目标域样本来对准分布表示并获得比较方法之间的最佳性能。我们进一步可视化了所提出的方法学到的表示，并发现源和目标域的表示会聚具有低方差。

著录项

来源
《IEEE Transactions on Emerging Topics in Computational Intelligence》 |2020年第4期|480-489|共10页
作者
Yufeng Xiao; Huan Zhao; Tingting Li;
展开▼
作者单位

College of Computer Science and Electronic Engineering Hunan University Changsha China;

College of Computer Science and Electronic Engineering Hunan University Changsha China;

College of Computer Science and Electronic Engineering Hunan University Changsha China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Task analysis; Emotion recognition; Training; Speech recognition; Hidden Markov models; Neural networks; Machine learning;

机译：任务分析;情绪识别;培训;语音识别;隐藏的马尔可夫模型;神经网络;机器学习;

相似文献

外文文献
中文文献
专利

1. Learning Deep Binaural Representations With Deep Convolutional Neural Networks for Spontaneous Speech Emotion Recognition [J] . Zhang Shiqing, Chen Aihua, Guo Wenping, Quality Control, Transactions . 2020,第期

机译：学习深层卷积神经网络的深层双耳陈述，用于自发言论情绪识别
2. Emotion Recognition in Speech with Latent Discriminative Representations Learning [J] . Han Jing, Zhang Zixing, Keren Gil, Acta acustica united with acustica . 2018,第5期

机译：潜在歧视性陈述学习的情感认知
3. Learning Corpus-Invariant Discriminant Feature Representations for Speech Emotion Recognition [J] . Peng SONG, Shifeng OU, Zhenbin DU, IEICE transactions on information and systems . 2017,第5期

机译：学习语料不变特征特征表示的语音情感识别
4. Improving Speech Emotion Recognition with Unsupervised Representation Learning on Unlabeled Speech [C] . Michael Neumann, Ngoc Thang Vu IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：通过对无标签语音进行无监督表示学习来提高语音情感识别能力
5. Neural Network Based Representation Learning and Modeling for Speech and Speaker Recognition [D] . Guo, Jinxi. 2019

机译：基于神经网络的语言和扬声器识别的模拟
6. Time-Frequency Feature Representation Using Multi-Resolution Texture Analysis and Acoustic Activity Detector for Real-Life Speech Emotion Recognition [O] . Kun-Ching Wang 2015

机译：使用多分辨率纹理分析和声活动检测器的时频特征表示用于现实生活中的语音情感识别
7. Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition [O] . Abhinav Shukla, Stavros Petridis, Maja Pantic 2021

机译：视觉自我监督是否改善了情感认可的语音表示学习

Learning Class-Aligned and Generalized Domain-Invariant Representations for Speech Emotion Recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅