首页> 外文会议>Cognitive Information Processing (CIP), 2012 3rd International Workshop on >Combining semantic and acoustic features for valence and arousal recognition in speech
【24h】

Combining semantic and acoustic features for valence and arousal recognition in speech

机译:结合语义和声学特征以进行语音中的价态和唤醒识别

获取原文
获取原文并翻译 | 示例

摘要

The recognition of affect in speech has attracted a lot of interest recently; especially in the area of cognitive and computer sciences. Most of the previous studies focused on the recognition of basic emotions (such as happiness, sadness and anger) using categorical approach. Recently, the focus has been shifting towards dimensional affect recognition based on the idea that emotional states are not independent from one another but related in a systematic manner. In this paper, we design a continuous dimensional speech affect recognition model that combines acoustic and semantic features. We design our own corpus that consists of 59 short movie clips with audio and text in subtitle format, rated by human subjects in arousal and valence (A-V) dimensions. For the acoustic part, we combine many features and use correlation based feature selection and apply support vector regression. For the semantic part, we use the affective norms for English words (ANEW), that are rated also in A-V dimensions, as keywords and apply latent semantics analysis (LSA) on those words and words in the clips to estimate A-V values in the clips. Finally, the results of acoustic and semantic parts are combined. We show that combining semantic and acoustic information for dimensional speech recognition improves the results. Moreover, we show that valence is better estimated using semantic features while arousal is better estimated using acoustic features.
机译:言语中的情感识别最近引起了很多兴趣。特别是在认知和计算机科学领域。先前的大多数研究都集中于使用分类方法来识别基本情绪(例如幸福,悲伤和愤怒)。近来,基于情绪状态不是彼此独立而是以系统的方式相互关联的思想,焦点已转向维度情感识别。在本文中,我们设计了一种结合了声学和语义特征的连续维语音情感识别模型。我们设计了自己的语料库,该语料库由59个带有字幕格式的音频和文本的短片组成,由人类对象在唤醒和化合(A-V)维度上进行评级。对于声学部分,我们结合了许多特征并使用基于相关性的特征选择并应用支持向量回归。对于语义部分,我们使用也在AV维度中评级的英语单词的情感规范(ANEW)作为关键字,并对片段中的这些单词和单词应用潜在语义分析(LSA)以估计片段中的AV值。最后,将听觉和语义部分的结果进行组合。我们表明,将语义和声学信息相结合以进行维语音识别可以改善结果。此外,我们表明,使用语义特征可以更好地估计化合价,而使用声学特征可以更好地估计唤醒度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号