首页> 外文OA文献 >Automatic emotion recognition: an investigation of acoustic and prosodic parameters
【2h】

Automatic emotion recognition: an investigation of acoustic and prosodic parameters

机译:自动情感识别:声学和韵律参数研究

摘要

An essential step to achieving human-machine speech communication with the naturalness of communication between humans is developing a machine that is capable of recognising emotions based on speech. This thesis presents research addressing this problem, by making use of acoustic and prosodic information.At a feature level, novel group delay and weighted frequency features are proposed. The group delay features are shown to emphasise information pertaining to formant bandwidths and are shown to be indicative of emotions. The weighted frequency feature, based on the recently introduced empirical mode decomposition, is proposed as a compact representation of the spectral energy distribution and is shown to outperform other estimates of energy distribution. Feature level comparisons suggest that detailed spectral measures are very indicative of emotions while exhibiting greater speaker specificity. Moreover, it is shown that all features are characteristic of the speaker and require some of sort of normalisation prior to use in a multi-speaker situation.A novel technique for normalising speaker-specific variability in features is proposed, which leads to significant improvements in the performances of systems trained and tested on data from different speakers. This technique is also used to investigate the amount of speaker-specific variability in different features. A preliminary study of phonetic variability suggests that phoneme specific traits are not modelled by the emotion models and that speaker variability is a more significant problem in the investigated setup.Finally, a novel approach to emotion modelling that takes into account temporal variations of speech parameters is analysed. An explicit model of the glottal spectrum is incorporated into the framework of the traditional source-filter model, and the parameters of this combined model are used to characterise speech signals. An automatic emotion recognition system that takes into account the shape of the contours of these parameters as they vary with time is shown to outperform a system that models only the parameter distributions. The novel approach is also empirically shown to be on par with human emotion classification performance.
机译:在人与人之间自然交流的基础上,实现人机语音通信的重要步骤是开发一种能够识别基于语音的情绪的机器。本文利用声学和韵律信息来解决这一问题。在特征层次上,提出了新颖的群时延和加权频率特征。示出了群延迟特征以强调与共振峰带宽有关的信息,并且示出为指示情绪。提出了基于最近引入的经验模式分解的加权频率特征,将其作为频谱能量分布的紧凑表示形式,并显示出优于其他能量分布估计值的特征。特征水平的比较表明,详细的频谱测量非常能说明情绪,同时表现出更高的说话人特异性。此外,还表明,所有特征都是说话人的特征,并且在用于多说话人的情况下需要先进行某种规格化。提出了一种新的技术来标准化说话人特定的功能变异性,从而带来了显着的改进根据来自不同说话者的数据训练和测试的系统的性能。此技术还用于调查不同功能中特定于说话者的变异量。语音变异性的初步研究表明,情绪模型不能对音素特定的特征进行建模,而说话者变异性在所研究的设置中是一个更重要的问题。最后,考虑语音参数随时间变化的新型情绪建模方法是分析。声门频谱的显式模型被并入传统源滤波器模型的框架中,并且该组合模型的参数用于表征语音信号。考虑到这些参数的轮廓随时间变化的形状的自动情感识别系统的性能优于仅对参数分布建模的系统。从经验上也证明了该新颖方法与人类情感分类性能相当。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号