首页> 外文会议>International Conference on Statistical Language and Speech Processing >Long-Term Statistical Feature Extraction from Speech Signal and Its Application in Emotion Recognition
【24h】

Long-Term Statistical Feature Extraction from Speech Signal and Its Application in Emotion Recognition

机译:语音信号的长期统计特征及其在情感识别中的应用

获取原文

摘要

In this paper we propose a statistical-based parametrization framework for representing the speech through a fixed-length supervector which paves the way for capturing the long-term properties of this signal. Having a fixed-length representation for a variable-length pattern like speech which preserved the task-relevant information allows for using a wide range of powerful discriminative models which could not effectively handle the variability in the pattern length. In the proposed approach, a GMM is trained for each class and the posterior probabilities of the components of all the GMMs are computed for each data instance (frame), averaged over all utterance frames and finally stacked into a supervector. The main benefits of the proposed method are making the feature extraction task-specific, performing a remarkable dimensionality reduction and yet preserving the discriminative capability of the extracted features. This method leads to an 7.6% absolute performance improvement in comparison with the baseline system which is a GMM-based classifier and results in 87.6% accuracy in emotion recognition task. Human performance on the employed database (Berlin) is reportedly 84.3%.
机译:在本文中,我们提出了一种基于统计的参数化框架,用于代表通过固定长度的监控器代表演讲,该传言铺平了捕获该信号的长期特性的方式。具有用于可变长度模式的固定长度表示,如语音,其保留任务相关信息允许使用广泛的强大的识别模型,这不能有效地处理图案长度的变异性。在所提出的方法中,针对每个类培训GMM,并且针对每个数据实例(帧)计算所有GMM的组件的后验概率,对所有话语帧进行平均并且最终堆叠到监控器中。所提出的方法的主要好处是特征提取任务特异性,表现出显着的维度降低,但却保留了提取特征的辨别能力。与基于GMM的分类器的基线系统相比,该方法可导致7.6%的绝对性能改进,并导致情感识别任务中的87.6%的准确性。据报道,雇用的数据库(柏林)对人类表现为84.3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号