Speech emotion recognition using hybrid spectral-prosodic features of speech signal/glottal waveform, metaheuristic-based dimensionality reduction, and Gaussian elliptical basis function network classifier

Daneshfar Fatemeh; Kabudian Seyed Jahanshah; Neekabadi Abbas

首页> 外文期刊>Applied Acoustics >Speech emotion recognition using hybrid spectral-prosodic features of speech signal/glottal waveform, metaheuristic-based dimensionality reduction, and Gaussian elliptical basis function network classifier

【24h】

Speech emotion recognition using hybrid spectral-prosodic features of speech signal/glottal waveform, metaheuristic-based dimensionality reduction, and Gaussian elliptical basis function network classifier

机译：语音情感识别使用语音信号/光学波形的混合谱 - 韵律特征，基于血管训练的维数减少和高斯椭圆形基函数网络分类器

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, a hybrid system consisting of three stages of feature extraction, dimensionality reduction, and feature classification is proposed for speech emotion recognition (SER). At feature extraction stage, an informationally-rich spectral-prosodic hybrid feature vector comprised of perceptual-spectral features; that is, mel-frequency cepstral coefficient (MFCC), perceptual linear prediction coefficient (PLPC), and perceptual minimum variance distortionless response (PMVDR) coefficient along with the prosodic feature of pitch (i.e. F0) are extracted for each frame. This feature vector is extracted from both speech signal and its glottal-waveform. The first and the second-order derivatives are then added to the above-mentioned vector to form a high-dimensional hybrid feature vector characterized by a large number of dimensions. At the next stage, i.e. dimensionality reduction, the dimensionality of this feature vector is reduced using a new proposed quantum-behaved particle swarm optimization (QPSO)-based approach. In this paper, a new QPSO algorithm (so-called, pQPSO) is presented that makes use of a truncated Laplace distribution (TLD) to generate new particles and thus to produce solutions (i.e. particles) that are all within a valid range of a problem (contrary to the standard QPSO). The contraction-expansion (CE) factor of the proposed pQPSO is also selected adaptively. Using the proposed QPSO algorithm, an optimal discriminative dimensionality reduction matrix (i.e. projection matrix) is estimated with emotion classification accuracy as a class-discriminative criterion. At the subsequent stage, vectors with reduced feature dimensionality are fed into a Gaussian elliptical basis function (GEBF)-type neural network classifier to detect their speech emotion. To accelerate the training phase of the GEBF classifier, a fast-scaled conjugate gradient (SCG) algorithm is correspondingly employed that does not need to adjust the learning rate. Finally, the proposed method is evaluated on three standard emotional speech databases of Berlin Database of Emotional Speech (EMODB), Surrey Audio-Visual Expressed Emotion (SAVEE), and Interactive Emotional Dyadic Motion Capture (IEMOCAP). The experimental results showed that the proposed method was more accurate than state-of-the-art ones in terms of detecting speech emotions. (C) 2020 Elsevier Ltd. All rights reserved.

机译：在本文中，提出了一种由特征提取，维度降低的三个阶段组成的混合系统，用于语音情感识别（SER）。在特征提取阶段，由感知光谱特征组成的信息富丰富的光谱韵律混合特征向量;也就是说，对每个帧提取熔融频率谱系数（MFCC），感知线性预测系数（PLPC）和感知最小方差失真响应（PMVDR）系数以及音高（即F0）的韵律特征。该特征向量是从语音信号和其光泽波形中提取的。然后将第一和二阶衍生物添加到上述载体中以形成具有大量尺寸的高维混合特征载体。在下一阶段，即减少维度，使用新的量子表现粒子群优化（基于QPSO）的方法，减少了该特征向量的维度。在本文中，提出了一种新的QPSO算法（所谓的PQPSO），其利用截断的LAPLACE分布（TLD）来产生新的粒子，从而产生全部在有效范围内的解决方案（即粒子）问题（与标准QPSO相反）。建议的PQPSO的收缩 - 膨胀（CE）因子自适应选择。使用所提出的QPSO算法，以情绪分类准确估计为类别辨别标准，估计最佳判别维度减少矩阵（即投影矩阵）。在随后的阶段，具有减少特征维度的载体被馈送到高斯椭圆形基函数（GeBF） - 型神经网络分类器中以检测其语音情绪。为了加速GeBF分类器的训练阶段，相应地使用快速缩放的共轭梯度（SCG）算法，这不需要调整学习率。最后，拟议的方法是在情绪语音（emodb），萨里视听表达的情感（Savee）和互动情绪二进制运动捕获（IEMocap）的三个标准情绪语音数据库上进行评估。实验结果表明，在检测语音情绪方面，所提出的方法比最先进的方法更准确。（c）2020 elestvier有限公司保留所有权利。

著录项

来源
《Applied Acoustics》 |2020年第9期|107360.1-107360.17|共17页
作者
Daneshfar Fatemeh; Kabudian Seyed Jahanshah; Neekabadi Abbas;
展开▼
作者单位

Razi Univ Dept Comp Engn & Informat Technol Kermanshah Iran;

Razi Univ Dept Comp Engn & Informat Technol Kermanshah Iran;

Razi Univ Dept Comp Engn & Informat Technol Kermanshah Iran;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Speech emotion recognition; Quantum-behaved particle swarm optimization; Gaussian elliptical basis function;

机译：语音情感识别;量子行为粒子群优化;高斯椭圆形基函数;

相似文献

外文文献
中文文献
专利

1. Improved Emotion Recognition Using Gaussian Mixture Model and Extreme Learning Machine in Speech and Glottal Signals [J] . Muthusamy Hariharan, Polat Kemal, Yaacob Sazali Mathematical Problems in Engineering . 2015,第pta3期

机译：使用高斯混合模型和极限学习机改进语音和声门信号中的情绪识别
2. Improved Emotion Recognition Using Gaussian Mixture Model and Extreme Learning Machine in Speech and Glottal Signals [J] . HariharanMuthusamy, KemalPolat, SazaliYaacob Mathematical Problems in Engineering: Theory, Methods and Applications . 2015,第5期

机译：使用高斯混合模型和极限学习机改进语音和声门信号中的情绪识别
3. Characteristics of human auditory model based on compensation of glottal features in speech emotion recognition [J] . Sun Ying, Zhang Xue-Ying Future generation computer systems . 2018,第APRa期

机译：基于语音特征的声门特征补偿的人类听觉模型特征
4. Emotions in speech - experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments [C] . Borchert, M., Dusterhoft, . 2005

机译：语音中的情感-具有语音韵律和质量特征的实验，用于类别和维度情感识别环境
5. Estimations of glottal waves and vocal-tract area functions from speech signals. [D] . Deng, Hui Qun. 2005

机译：从语音信号估计声门波和声道区域功能。
6. Particle Swarm Optimization Based Feature Enhancement and Feature Selection for Improved Emotion Recognition in Speech and Glottal Signals [O] . Hariharan Muthusamy, Kemal Polat, Sazali Yaacob -1

机译：基于粒子群优化的特征增强和特征选择用于语音和声门信号中的情感识别
7. Particle swarm optimization based feature enhancement and feature selection for improved emotion recognition in speech and glottal signals. [O] . Hariharan Muthusamy, Kemal Polat, Sazali Yaacob 2015

机译：基于粒子群优化的特征增强和特征选择，用于改善语音和声门信号中的情感识别。

Speech emotion recognition using hybrid spectral-prosodic features of speech signal/glottal waveform, metaheuristic-based dimensionality reduction, and Gaussian elliptical basis function network classifier

摘要

著录项

相似文献

相关主题

期刊订阅