首页> 外文会议>International Conference on Artificial Neural Networks >Variational Autoencoder with Global- and Medium Timescale Auxiliaries for Emotion Recognition from Speech
【24h】

Variational Autoencoder with Global- and Medium Timescale Auxiliaries for Emotion Recognition from Speech

机译:变形AutoEncoder与全球和中等时间尺度辅助言论认可

获取原文

摘要

Unsupervised learning is based on the idea of self-organization to find hidden patterns and features in the data without the need for labels. Variational autoencoders (VAEs) are generative unsupervised learning models that create low-dimensional representations of the input data and learn by regenerating the same input from that representation. Recently, VAEs were used to extract representations from audio data, which possess not only content-dependent information but also speaker-dependent information such as gender, health status, and speaker ID. VAEs with two timescale variables were then introduced to disentangle these two kinds of information from each other. Our approach introduces a third, i.e. medium timescale into a VAE. So instead of having only a global and a local timescale variable, this model holds a global, a medium, and a local variable. We tested the model on three downstream applications: speaker identification, gender classification, and emotion recognition, where each hidden representation performed better on some specific tasks than the other hidden representations. Speaker ID and gender were best reported by the global variable, while emotion was best extracted when using the medium. Our model achieves excellent results exceeding state-of-the-art models on speaker identification and emotion regression from audio.
机译:无监督的学习是基于自我组织的想法,以找到数据中的隐藏模式和功能,而无需标签。变形AutoEncoders(VAES)是生成无监督的学习模型,用于创建输入数据的低维表示,并通过从该表示中重新生成相同的输入来学习。最近,VAE用于从音频数据中提取表示,它不仅具有内容相关的信息,而且还具有涉及的扬声器相关信息,例如性别,健康状态和扬声器ID。然后引入具有两个时间尺度变量的VAE,以彼此分解这两种信息。我们的方法引入了第三个,即中等时间尺度进入VAE。因此,该模型而不是仅具有全局和本地时间尺度变量,而不是全局,媒体和局部变量。我们测试了三个下游应用程序的模型:扬声器识别,性别分类和情感识别,其中每个隐藏的表示在某些特定的任务中比其他隐藏表示更好。扬声器ID和性别最佳地由全局变量报告,而在使用媒体时最佳地提取情绪。我们的模式实现了优异的结果,超出了扬声器识别和音频情感回归的最先进模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号