...
首页> 外文期刊>Applied Acoustics >Speech emotion recognition using hybrid spectral-prosodic features of speech signal/glottal waveform, metaheuristic-based dimensionality reduction, and Gaussian elliptical basis function network classifier
【24h】

Speech emotion recognition using hybrid spectral-prosodic features of speech signal/glottal waveform, metaheuristic-based dimensionality reduction, and Gaussian elliptical basis function network classifier

机译:语音情感识别使用语音信号/光学波形的混合谱 - 韵律特征,基于血管训练的维数减少和高斯椭圆形基函数网络分类器

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In this paper, a hybrid system consisting of three stages of feature extraction, dimensionality reduction, and feature classification is proposed for speech emotion recognition (SER). At feature extraction stage, an informationally-rich spectral-prosodic hybrid feature vector comprised of perceptual-spectral features; that is, mel-frequency cepstral coefficient (MFCC), perceptual linear prediction coefficient (PLPC), and perceptual minimum variance distortionless response (PMVDR) coefficient along with the prosodic feature of pitch (i.e. F0) are extracted for each frame. This feature vector is extracted from both speech signal and its glottal-waveform. The first and the second-order derivatives are then added to the above-mentioned vector to form a high-dimensional hybrid feature vector characterized by a large number of dimensions. At the next stage, i.e. dimensionality reduction, the dimensionality of this feature vector is reduced using a new proposed quantum-behaved particle swarm optimization (QPSO)-based approach. In this paper, a new QPSO algorithm (so-called, pQPSO) is presented that makes use of a truncated Laplace distribution (TLD) to generate new particles and thus to produce solutions (i.e. particles) that are all within a valid range of a problem (contrary to the standard QPSO). The contraction-expansion (CE) factor of the proposed pQPSO is also selected adaptively. Using the proposed QPSO algorithm, an optimal discriminative dimensionality reduction matrix (i.e. projection matrix) is estimated with emotion classification accuracy as a class-discriminative criterion. At the subsequent stage, vectors with reduced feature dimensionality are fed into a Gaussian elliptical basis function (GEBF)-type neural network classifier to detect their speech emotion. To accelerate the training phase of the GEBF classifier, a fast-scaled conjugate gradient (SCG) algorithm is correspondingly employed that does not need to adjust the learning rate. Finally, the proposed method is evaluated on three standard emotional speech databases of Berlin Database of Emotional Speech (EMODB), Surrey Audio-Visual Expressed Emotion (SAVEE), and Interactive Emotional Dyadic Motion Capture (IEMOCAP). The experimental results showed that the proposed method was more accurate than state-of-the-art ones in terms of detecting speech emotions. (C) 2020 Elsevier Ltd. All rights reserved.
机译:在本文中,提出了一种由特征提取,维度降低的三个阶段组成的混合系统,用于语音情感识别(SER)。在特征提取阶段,由感知光谱特征组成的信息富丰富的光谱韵律混合特征向量;也就是说,对每个帧提取熔融频率谱系数(MFCC),感知线性预测系数(PLPC)和感知最小方差失真响应(PMVDR)系数以及音高(即F0)的韵律特征。该特征向量是从语音信号和其光泽波形中提取的。然后将第一和二阶衍生物添加到上述载体中以形成具有大量尺寸的高维混合特征载体。在下一阶段,即减少维度,使用新的量子表现粒子群优化(基于QPSO)的方法,减少了该特征向量的维度。在本文中,提出了一种新的QPSO算法(所谓的PQPSO),其利用截断的LAPLACE分布(TLD)来产生新的粒子,从而产生全部在有效范围内的解决方案(即粒子)问题(与标准QPSO相反)。建议的PQPSO的收缩 - 膨胀(CE)因子自适应选择。使用所提出的QPSO算法,以情绪分类准确估计为类别辨别标准,估计最佳判别维度减少矩阵(即投影矩阵)。在随后的阶段,具有减少特征维度的载体被馈送到高斯椭圆形基函数(GeBF) - 型神经网络分类器中以检测其语音情绪。为了加速GeBF分类器的训练阶段,相应地使用快速缩放的共轭梯度(SCG)算法,这不需要调整学习率。最后,拟议的方法是在情绪语音(emodb),萨里视听表达的情感(Savee)和互动情绪二进制运动捕获(IEMocap)的三个标准情绪语音数据库上进行评估。实验结果表明,在检测语音情绪方面,所提出的方法比最先进的方法更准确。 (c)2020 elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号