首页> 外文期刊>Journal of ambient intelligence and humanized computing >Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition
【24h】

Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition

机译:基于深度信念网络的特征融合方法在噪声条件下的语音情感识别

获取原文
获取原文并翻译 | 示例
       

摘要

The speech emotion recognition accuracy of prosody feature and voice quality feature declines with the decrease of signal to noise ratio (SNR) of speech signals. In this paper, we propose novel sub-band spectral centroid weighted wavelet packet Cepstral coefficients (W-WPCC) for robust speech emotion recognition. The W-WPCC feature is computed by combining the sub-band energies with sub-band spectral centroids via a weighting scheme to generate noise-robust acoustic features. And deep belief networks (DBNs) are artificial neural networks having more than one hidden layer, which are first pre-trained layer by layer and then fine-tuned using back propagation algorithm. The well-trained deep neural networks are capable of modeling complex and non-linear features of input training data and can better predict the probability distribution over classification labels. We extracted prosody feature, voice quality features and wavelet packet Cepstral coefficients (WPCC) from the speech signals to combine with W-WPCC and fused them by DBNs. Experimental results on Berlin emotional speech database show that the proposed fused feature with W-WPCC is more suitable in speech emotion recognition under noisy conditions than other acoustics features and proposed DBNs feature learning structure combined with W-WPCC improve emotion recognition performance over the conventional emotion recognition method.
机译:韵律特征和语音质量特征的语音情感识别精度随着语音信号的信噪比(SNR)的降低而降低。在本文中,我们提出了新的子带谱质心加权小波包倒频谱系数(W-WPCC),用于鲁棒的语音情感识别。 W-WPCC特征是通过将子带能量与子带频谱质心通过加权方案组合以生成噪声稳健的声学特征来计算的。深度信念网络(DBN)是具有多个隐藏层的人工神经网络,它们首先被逐层预训练,然后使用反向传播算法进行微调。训练有素的深度神经网络能够对输入训练数据的复杂和非线性特征进行建模,并且可以更好地预测分类标签上的概率分布。我们从语音信号中提取韵律特征,语音质量特征和小波包倒频谱系数(WPCC),以与W-WPCC结合并通过DBN融合。在柏林情感语音数据库上的实验结果表明,与其他声学特征相比,所提出的与W-WPCC融合的特征比其他声学特征更适合于嘈杂条件下的语音情感识别,并且所提出的DBNs特征学习结构与W-WPCC的结合比常规情感具有更高的情感识别性能识别方法。

著录项

  • 来源
  • 作者单位

    Southeast Univ, Lab Measurement & Control Complex Syst Engn, Nanjing, Jiangsu, Peoples R China|Southeast Univ, Sch Automat, Minist Educ, Nanjing 210096, Jiangsu, Peoples R China;

    Southeast Univ, Lab Measurement & Control Complex Syst Engn, Nanjing, Jiangsu, Peoples R China|Southeast Univ, Sch Automat, Minist Educ, Nanjing 210096, Jiangsu, Peoples R China;

    Southeast Univ, Lab Measurement & Control Complex Syst Engn, Nanjing, Jiangsu, Peoples R China|Southeast Univ, Sch Automat, Minist Educ, Nanjing 210096, Jiangsu, Peoples R China;

    Southeast Univ, Lab Measurement & Control Complex Syst Engn, Nanjing, Jiangsu, Peoples R China|Southeast Univ, Sch Automat, Minist Educ, Nanjing 210096, Jiangsu, Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Speech emotion recognition; Weighted wavelet packets Cepstral coefficients (W-WPCC); Feature fusion; Deep belief networks (DBNs);

    机译:语音情感识别;加权小波包倒频谱系数(W-WPCC);特征融合;深度信念网络(DBN);

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号