Long-Term Statistical Feature Extraction from Speech Signal and Its Application in Emotion Recognition

机译：语音信号的长期统计特征及其在情感识别中的应用

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we propose a statistical-based parametrization framework for representing the speech through a fixed-length supervector which paves the way for capturing the long-term properties of this signal. Having a fixed-length representation for a variable-length pattern like speech which preserved the task-relevant information allows for using a wide range of powerful discriminative models which could not effectively handle the variability in the pattern length. In the proposed approach, a GMM is trained for each class and the posterior probabilities of the components of all the GMMs are computed for each data instance (frame), averaged over all utterance frames and finally stacked into a supervector. The main benefits of the proposed method are making the feature extraction task-specific, performing a remarkable dimensionality reduction and yet preserving the discriminative capability of the extracted features. This method leads to an 7.6% absolute performance improvement in comparison with the baseline system which is a GMM-based classifier and results in 87.6% accuracy in emotion recognition task. Human performance on the employed database (Berlin) is reportedly 84.3%.

机译：在本文中，我们提出了一种基于统计的参数化框架，用于代表通过固定长度的监控器代表演讲，该传言铺平了捕获该信号的长期特性的方式。具有用于可变长度模式的固定长度表示，如语音，其保留任务相关信息允许使用广泛的强大的识别模型，这不能有效地处理图案长度的变异性。在所提出的方法中，针对每个类培训GMM，并且针对每个数据实例（帧）计算所有GMM的组件的后验概率，对所有话语帧进行平均并且最终堆叠到监控器中。所提出的方法的主要好处是特征提取任务特异性，表现出显着的维度降低，但却保留了提取特征的辨别能力。与基于GMM的分类器的基线系统相比，该方法可导致7.6％的绝对性能改进，并导致情感识别任务中的87.6％的准确性。据报道，雇用的数据库（柏林）对人类表现为84.3％。

著录项

来源
《International Conference on Statistical Language and Speech Processing》|2015年||共12页
会议地点
作者
Erfan Loweimi; Mortaza Doulaty; Jon Barker; Thomas Hain;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
Discriminative model; Emotion recognition; Feature extraction; Generative model; Speech signal;

机译：辨别模型;情绪识别;特征提取;生成模型;语音信号;
入库时间 2022-08-20 22:46:18

相似文献

外文文献
中文文献
专利

1. A STATISTICAL ANALYSIS ON THE IMPACT OF SPEECH ENHANCEMENT TECHNIQUES ON THE FEATURE VECTORS OF NOISY SPEECH SIGNALS FOR SPEECH RECOGNITION [J] . SWAPNANIL GOGOI, UTPAL BHATTACHARJEE Journal of computer science engineering and information technology research . 2016,第3期

机译：语音增强技术对语音识别中嘈杂语音信号特征向量影响的统计分析
2. A STATISTICAL ANALYSIS ON THE IMPACT OF SPEECH ENHANCEMENT TECHNIQUES ON THE FEATURE VECTORS OF NOISY SPEECH SIGNALS FOR SPEECH RECOGNITION [J] . SWAPNANIL GOGOI, UTPAL BHATTACHARJEE Journal of computer science engineering and information technology research . 2016,第3期

机译：语音增强技术对语音识别中嘈杂语音信号特征向量影响的统计分析
3. Speech emotion recognition using hybrid spectral-prosodic features of speech signal/glottal waveform, metaheuristic-based dimensionality reduction, and Gaussian elliptical basis function network classifier [J] . Daneshfar Fatemeh, Kabudian Seyed Jahanshah, Neekabadi Abbas Applied Acoustics . 2020,第Sepa期

机译：语音情感识别使用语音信号/光学波形的混合谱 - 韵律特征，基于血管训练的维数减少和高斯椭圆形基函数网络分类器
4. Long-Term Statistical Feature Extraction from Speech Signal and Its Application in Emotion Recognition [C] . Erfan Loweimi, Mortaza Doulaty, Jon Barker, International conference on statistical language and speech processing . 2015

机译：语音信号的长期统计特征提取及其在情绪识别中的应用
5. Representing signals using only timing information and feature extraction for automatic speech recognition. [D] . Wang, Yadong. 2003

机译：仅使用时序信息和特征提取来表示信号，即可进行自动语音识别。
6. On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition [O] . Juraj Kacur, Boris Puterka, Jarmila Pavlovicova, 2021

机译：语音情感识别中的语音特性和特征提取方法
7. MSP-MFCC: Energy-Efficient MFCC Feature Extraction Method With Mixed-Signal Processing Architecture for Wearable Speech Recognition Applications [O] . Qin Li, Yuze Yang, Tianxiang Lan, 2020

机译：MSP-MFCC：节能MFCC功能提取方法，具有用于可佩戴式语音识别应用的混合信号处理架构

Long-Term Statistical Feature Extraction from Speech Signal and Its Application in Emotion Recognition

摘要

著录项

相似文献

相关主题

期刊订阅