首页> 外文会议>International conference on human-computer interaction >Speech Emotion Recognition Integrating Paralinguistic Features and Auto-encoders in a Deep Learning Model
【24h】

Speech Emotion Recognition Integrating Paralinguistic Features and Auto-encoders in a Deep Learning Model

机译:深度学习模型中整合了副语言功能和自动编码器的语音情感识别

获取原文

摘要

Emotions play an extremely important role in human decisions and interactions with both other humans and machines. This fact had promoted development of methods that aim to recognize emotions from different physiological signals. Particularly, emotion recognition from speech signals is still a research challenge due to the large voice variability between subjects. In this work, paralinguistic features and deep learning models are used to perform speech emotion classification. A set of 1582 INTERSPEECH 2010 features is initially extracted from the speech signals, which are then used to feed a deep convolutional stack auto-encoder network that transform those features in a higher level representation. Then, a multilayer perceptron is trained to classify the utterances in one of six emotions: anger, fear, disgust, happiness, surprise and sadness. The size of the auto-encoders was evaluated for 4 different architectures, in terms of performance, computational cost and execution time for obtaining the most suitable configuration model. Thus, the proposed approach was twofold evaluated. First, a 5-fold cross-validation strategy was performed using 70% of the samples. Then, the best network architecture was used to evaluate the classification in a validation set, composed of the remaining 30% of samples. Results report an overall accuracy of 91.4 in the 5-fold testing stage and 61,1 in the validation set.
机译:情绪在人类的决策以及与其他人类和机器的互动中起着极其重要的作用。这一事实促进了旨在识别来自不同生理信号的情绪的方法的发展。特别地,由于对象之间的语音差异大,因此语音信号的情感识别仍然是研究的挑战。在这项工作中,使用副语言功能和深度学习模型来执行语音情感分类。首先从语音信号中提取一组1582个INTERSPEECH 2010功能,然后将这些信号用于馈入深度卷积堆栈自动编码器网络,该网络将这些特征转换为更高级别的表示形式。然后,训练一个多层感知器,将发声分为六种情绪之一:愤怒,恐惧,厌恶,幸福,惊奇和悲伤。为了获得最合适的配置模型,针对性能,计算成本和执行时间,对4种不同架构的自动编码器的大小进行了评估。因此,对提出的方法进行了双重评估。首先,使用70%的样本执行5倍交叉验证策略。然后,使用最佳的网络体系结构评估由剩余30%的样本组成的验证集中的分类。结果报告在5倍测试阶段的总体准确度为91.4,而在验证集中的总体准确度为61,1。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号