Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition

Huang Yongming; Tian Kexin; Wu Ao; Zhang Guobao

首页> 外文期刊>Journal of ambient intelligence and humanized computing >Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition

【24h】

Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition

机译：基于深度信念网络的特征融合方法在噪声条件下的语音情感识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The speech emotion recognition accuracy of prosody feature and voice quality feature declines with the decrease of signal to noise ratio (SNR) of speech signals. In this paper, we propose novel sub-band spectral centroid weighted wavelet packet Cepstral coefficients (W-WPCC) for robust speech emotion recognition. The W-WPCC feature is computed by combining the sub-band energies with sub-band spectral centroids via a weighting scheme to generate noise-robust acoustic features. And deep belief networks (DBNs) are artificial neural networks having more than one hidden layer, which are first pre-trained layer by layer and then fine-tuned using back propagation algorithm. The well-trained deep neural networks are capable of modeling complex and non-linear features of input training data and can better predict the probability distribution over classification labels. We extracted prosody feature, voice quality features and wavelet packet Cepstral coefficients (WPCC) from the speech signals to combine with W-WPCC and fused them by DBNs. Experimental results on Berlin emotional speech database show that the proposed fused feature with W-WPCC is more suitable in speech emotion recognition under noisy conditions than other acoustics features and proposed DBNs feature learning structure combined with W-WPCC improve emotion recognition performance over the conventional emotion recognition method.

机译：韵律特征和语音质量特征的语音情感识别精度随着语音信号的信噪比（SNR）的降低而降低。在本文中，我们提出了新的子带谱质心加权小波包倒频谱系数（W-WPCC），用于鲁棒的语音情感识别。 W-WPCC特征是通过将子带能量与子带频谱质心通过加权方案组合以生成噪声稳健的声学特征来计算的。深度信念网络（DBN）是具有多个隐藏层的人工神经网络，它们首先被逐层预训练，然后使用反向传播算法进行微调。训练有素的深度神经网络能够对输入训练数据的复杂和非线性特征进行建模，并且可以更好地预测分类标签上的概率分布。我们从语音信号中提取韵律特征，语音质量特征和小波包倒频谱系数（WPCC），以与W-WPCC结合并通过DBN融合。在柏林情感语音数据库上的实验结果表明，与其他声学特征相比，所提出的与W-WPCC融合的特征比其他声学特征更适合于嘈杂条件下的语音情感识别，并且所提出的DBNs特征学习结构与W-WPCC的结合比常规情感具有更高的情感识别性能识别方法。

著录项

来源
《Journal of ambient intelligence and humanized computing》 |2019年第5期|1787-1798|共12页
作者
Huang Yongming; Tian Kexin; Wu Ao; Zhang Guobao;
展开▼
作者单位

Southeast Univ, Lab Measurement & Control Complex Syst Engn, Nanjing, Jiangsu, Peoples R China|Southeast Univ, Sch Automat, Minist Educ, Nanjing 210096, Jiangsu, Peoples R China;

Southeast Univ, Lab Measurement & Control Complex Syst Engn, Nanjing, Jiangsu, Peoples R China|Southeast Univ, Sch Automat, Minist Educ, Nanjing 210096, Jiangsu, Peoples R China;

Southeast Univ, Lab Measurement & Control Complex Syst Engn, Nanjing, Jiangsu, Peoples R China|Southeast Univ, Sch Automat, Minist Educ, Nanjing 210096, Jiangsu, Peoples R China;

Southeast Univ, Lab Measurement & Control Complex Syst Engn, Nanjing, Jiangsu, Peoples R China|Southeast Univ, Sch Automat, Minist Educ, Nanjing 210096, Jiangsu, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Speech emotion recognition; Weighted wavelet packets Cepstral coefficients (W-WPCC); Feature fusion; Deep belief networks (DBNs);

机译：语音情感识别;加权小波包倒频谱系数（W-WPCC）;特征融合;深度信念网络（DBN）;

相似文献

外文文献
中文文献
专利

1. Deep and shallow features fusion based on deep convolutional neural network for speech emotion recognition [J] . Linhui Sun, Jia Chen, Keli Xie, International journal of speech technology . 2018,第4期

机译：基于深度卷积神经网络的深浅特征融合在语音情感识别中的应用
2. A Research of Speech Emotion Recognition Based on Deep Belief Network and SVM [J] . ChenchenHuang, WeiGong, WenlongFu, Mathematical Problems in Engineering: Theory, Methods and Applications . 2014,第a期

机译：基于深度信仰网络和SVM的语音情感识别研究
3. Modified deep belief network based human emotion recognition with multiscale features from video sequences [J] . Sreenivas Velagapudi, Namdeo Varsha, Kumar Eda Vijay Software, practice & experience . 2021,第6期

机译：基于深度信仰网络的人类情感识别与视频序列的多尺度特征
4. Emotion Speech Recognition Based on Adaptive Fractional Deep Belief Network and Reinforcement Learning [C] . J. Sangeetha, T. Jayasankar International Conference on Cognitive Informatics and Soft Computing . 2019

机译：基于自适应分数深度信仰网络和强化学习的情感语音识别
5. Dysarthric Speech Recognition and Offline Handwriting Recognition using Deep Neural Networks. [D] . Pillai, Suhas Balkrishna. 2017

机译：使用深度神经网络的表情异常语音识别和离线手写识别。
6. Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning [O] . Dong Liu, Zhiyong Wang, Lifeng Wang, 2021

机译：基于深度学习的语音表达多模态融合情绪识别方法
7. CLSTM: Deep Feature-Based Speech Emotion Recognition Using the Hierarchical ConvLSTM Network [O] . Soonil Kwon 2020

机译：CLSTM：基于深度特征的语音情感识别，使用分层Convlstm网络
8. Speech Recognition Using Kohonen Neural Networks, Dynamic Programming and Multi-Feature Fusion. [R] . Stowe, F. S. 1990

机译：使用Kohonen神经网络，动态规划和多特征融合的语音识别。

Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition

摘要

著录项

相似文献

相关主题

期刊订阅