首页> 外文期刊>Biomedical signal processing and control >Speech emotion recognition with deep convolutional neural networks
【24h】

Speech emotion recognition with deep convolutional neural networks

机译:与深卷积神经网络的语音情感识别

获取原文
获取原文并翻译 | 示例
           

摘要

The speech emotion recognition (or, classification) is one of the most challenging topics in data science. In this work, we introduce a new architecture, which extracts mel-frequency cepstral coefficients, chromagram, mel-scale spectrogram, Tonnetz representation, and spectral contrast features from sound files and uses them as inputs for the one-dimensional Convolutional Neural Network for the identification of emotions using samples from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), Berlin (EMO-DB), and Interactive Emotional Dyadic Motion Capture (IEMOCAP) datasets. We utilize an incremental method for modifying our initial model in order to improve classification accuracy. All of the proposed models work directly with raw sound data without the need for conversion to visual representations, unlike some previous approaches. Based on experimental results, our best-performing model outperforms existing frameworks for RAVDESS and IEMOCAP, thus setting the new state-of-the-art. For the EMO-DB dataset, it outperforms all previous works except one but compares favorably with that one in terms of generality, simplicity, and applicability. Specifically, the proposed framework obtains 71.61% for RAVDESS with 8 classes, 86.1% for EMO-DB with 535 samples in 7 classes, 95.71% for EMO-DB with 520 samples in 7 classes, and 64.3% for IEMOCAP with 4 classes in speaker-independent audio classification tasks. (C) 2020 Elsevier Ltd. All rights reserved.
机译:语音情感认可(或分类)是数据科学中最具挑战性的主题之一。在这项工作中,我们介绍了一种新的架构,其从声音文件中提取熔融频率谱系齐数,Chernagram,Mel-Scal谱图,吨位表示和光谱对比度特征,并将它们用作一维卷积神经网络的输入使用来自情绪语音和歌曲(Ravdess),柏林(EMO-DB)和交互式情绪二进制运动捕获(IEMocap)数据集的ryerson视听数据库中使用样本的识别情绪。我们利用一个增量方法来修改我们的初始模型,以提高分类准确性。与某些先前的方法不同,所有拟议的模型都直接使用原始声音数据,而无需转换为可视化表示。基于实验结果,我们最好的模型优于Ravdess和Iemocap的现有框架,从而设定了新的最先进。对于EMO-DB数据集,除了一个以外的所有上一件工作,除了一个,而且在一般性,简单性和适用性方面比较有利地比较。具体而言,拟议的框架为RAVDES获得71.61%,对于8级,eMO-DB的86.1%,7级样品,95.71%,emo-dB,7个类别中的520个样本,Iemocap在扬声器中有4个课程的64.3% - 独立的音频分类任务。 (c)2020 elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号