Variational Autoencoder with Global- and Medium Timescale Auxiliaries for Emotion Recognition from Speech

机译：变形AutoEncoder与全球和中等时间尺度辅助言论认可

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Unsupervised learning is based on the idea of self-organization to find hidden patterns and features in the data without the need for labels. Variational autoencoders (VAEs) are generative unsupervised learning models that create low-dimensional representations of the input data and learn by regenerating the same input from that representation. Recently, VAEs were used to extract representations from audio data, which possess not only content-dependent information but also speaker-dependent information such as gender, health status, and speaker ID. VAEs with two timescale variables were then introduced to disentangle these two kinds of information from each other. Our approach introduces a third, i.e. medium timescale into a VAE. So instead of having only a global and a local timescale variable, this model holds a global, a medium, and a local variable. We tested the model on three downstream applications: speaker identification, gender classification, and emotion recognition, where each hidden representation performed better on some specific tasks than the other hidden representations. Speaker ID and gender were best reported by the global variable, while emotion was best extracted when using the medium. Our model achieves excellent results exceeding state-of-the-art models on speaker identification and emotion regression from audio.

机译：无监督的学习是基于自我组织的想法，以找到数据中的隐藏模式和功能，而无需标签。变形AutoEncoders（VAES）是生成无监督的学习模型，用于创建输入数据的低维表示，并通过从该表示中重新生成相同的输入来学习。最近，VAE用于从音频数据中提取表示，它不仅具有内容相关的信息，而且还具有涉及的扬声器相关信息，例如性别，健康状态和扬声器ID。然后引入具有两个时间尺度变量的VAE，以彼此分解这两种信息。我们的方法引入了第三个，即中等时间尺度进入VAE。因此，该模型而不是仅具有全局和本地时间尺度变量，而不是全局，媒体和局部变量。我们测试了三个下游应用程序的模型：扬声器识别，性别分类和情感识别，其中每个隐藏的表示在某些特定的任务中比其他隐藏表示更好。扬声器ID和性别最佳地由全局变量报告，而在使用媒体时最佳地提取情绪。我们的模式实现了优异的结果，超出了扬声器识别和音频情感回归的最先进模型。

著录项

来源
《International Conference on Artificial Neural Networks》|2020年|529-540|共12页
会议地点
作者
Hussam Almotlak; Cornelius Weber; Leyuan Qu; Stefan Wermter;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Unsupervised learning; Feature extraction; Variational autoencoders; VAE with auxiliary variables; Multi-timescale neural network; Speaker identification; Emotion recognition;

机译：无人监督的学习;特征提取;变形式自动化器;vae与辅助变量;多时间尺度神经网络;扬声器识别;情感认可;

相似文献

外文文献
中文文献
专利

1. Semisupervised Autoencoders for Speech Emotion Recognition [J] . Jun Deng, Xinzhou Xu, Zixing Zhang, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2018,第1期

机译：半监督自动编码器，用于语音情感识别
2. Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition [J] . Deng J., Zhang Z., Eyben F., IEEE signal processing letters . 2014,第9期

机译：基于自动编码器的无监督域自适应语音情感识别
3. Assessment of spontaneous emotional speech database toward emotion recognition: Intensity and similarity of perceived emotion from spontaneously expressed emotional speech [J] . Arimoto Y., Ohno S., Iida H. Acoustical science and technology . 2011,第1期

机译：评估针对情绪识别的自发情绪语音数据库：自发表达情绪语音时感知到的情绪的强度和相似性
4. Variational Autoencoder with Global-and Medium Timescale Auxiliaries for Emotion Recognition from Speech [C] . Hussam Almotlak, Cornelius Weber, Leyuan Qu, International Conference on Artificial Neural Networks . 2020

机译：变形AutoEncoder与全球和中等时间尺度辅助的情感认知
5. Domain Adaptation for Speech Based Emotion Recognition [D] . Abdelwahab, Mohammed. 2019

机译：基于语音情感识别的域适应
6. Latent Factor Decoding of Multi-Channel EEG for Emotion Recognition Through Autoencoder-Like Neural Networks [O] . Xiang Li, Zhigang Zhao, Dawei Song, 2020

机译：通过自动化的神经网络对情感识别的多通道脑电图的潜在因子解码
7. Autoencoder With Emotion Embedding for Speech Emotion Recognition [O] . Chenghao Zhang, Lei Xue 2021

机译：自动拓展与情感嵌入言语情感识别

Variational Autoencoder with Global- and Medium Timescale Auxiliaries for Emotion Recognition from Speech

摘要

著录项

相似文献

相关主题

期刊订阅