Combining semantic and acoustic features for valence and arousal recognition in speech

机译：结合语义和声学特征以进行语音中的价态和唤醒识别

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The recognition of affect in speech has attracted a lot of interest recently; especially in the area of cognitive and computer sciences. Most of the previous studies focused on the recognition of basic emotions (such as happiness, sadness and anger) using categorical approach. Recently, the focus has been shifting towards dimensional affect recognition based on the idea that emotional states are not independent from one another but related in a systematic manner. In this paper, we design a continuous dimensional speech affect recognition model that combines acoustic and semantic features. We design our own corpus that consists of 59 short movie clips with audio and text in subtitle format, rated by human subjects in arousal and valence (A-V) dimensions. For the acoustic part, we combine many features and use correlation based feature selection and apply support vector regression. For the semantic part, we use the affective norms for English words (ANEW), that are rated also in A-V dimensions, as keywords and apply latent semantics analysis (LSA) on those words and words in the clips to estimate A-V values in the clips. Finally, the results of acoustic and semantic parts are combined. We show that combining semantic and acoustic information for dimensional speech recognition improves the results. Moreover, we show that valence is better estimated using semantic features while arousal is better estimated using acoustic features.

机译：言语中的情感识别最近引起了很多兴趣。特别是在认知和计算机科学领域。先前的大多数研究都集中于使用分类方法来识别基本情绪（例如幸福，悲伤和愤怒）。近来，基于情绪状态不是彼此独立而是以系统的方式相互关联的思想，焦点已转向维度情感识别。在本文中，我们设计了一种结合了声学和语义特征的连续维语音情感识别模型。我们设计了自己的语料库，该语料库由59个带有字幕格式的音频和文本的短片组成，由人类对象在唤醒和化合（A-V）维度上进行评级。对于声学部分，我们结合了许多特征并使用基于相关性的特征选择并应用支持向量回归。对于语义部分，我们使用也在AV维度中评级的英语单词的情感规范（ANEW）作为关键字，并对片段中的这些单词和单词应用潜在语义分析（LSA）以估计片段中的AV值。最后，将听觉和语义部分的结果进行组合。我们表明，将语义和声学信息相结合以进行维语音识别可以改善结果。此外，我们表明，使用语义特征可以更好地估计化合价，而使用声学特征可以更好地估计唤醒度。

著录项

来源
《Cognitive Information Processing (CIP), 2012 3rd International Workshop on》|2012年|p.1- 6|共6页
会议地点 Baiona(ES)
作者
Karadogan Seliz Gulsen; Larsen Jan;
展开▼
作者单位

Department of Informatics and Mathematical Modelling, Technical University of Denmark, Copenhagen, 2800, Denmark;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类信号处理;
关键词

相似文献

外文文献
中文文献
专利

1. Speech recognition and acoustic features in combined electric and acoustic stimulation [J] . YoonY.-S., LiY., FuQ.-J. Journal of speech, language, and hearing research: JSLHR . 2012,第1期

机译：结合电刺激和声刺激的语音识别和声学特征
2. Effect of combining auditory features with acoustic parameters on the probability scales in forensic speech recognition [J] . Bhall Babita, Singh C.P., Dhar Rakesh Research Journal of Forensic Sciences . 2018,第3期

机译：听觉特征与声学参数的组合对法医语音识别中概率标度的影响
3. Evaluation of Influence of Arousal-Valence Primitives on Speech Emotion Recognition [J] . Trabelsi Imen, Ben Ayed Dorra, Ellouze Noureddine The international arab journal of information technology . 2018,第4期

机译：配价基元对语音情感识别的影响评估
4. Combining semantic and acoustic features for valence and arousal recognition in speech [C] . Karadogan Seliz Gulsen, Larsen Jan International Workshop on Cognitive Information Processing . 2012

机译：组合语义和声学特征在语音中的价和唤醒认可
5. Acoustic modeling and feature selection for speech recognition. [D] . Zheng, Yanli. 2005

机译：用于语音识别的声学建模和特征选择。
6. Speech Recognition and Acoustic Features in Combined Electric and Acoustic Stimulation [O] . Yang-soo Yoon, Yongxin Li, Qian-Jie Fu -1

机译：电声组合刺激中的语音识别和声学特征
7. Missing feature reconstruction and acoustic model adaptation combined for large vocabulary continuous speech recognition [O] . Kurimo M., Palomaki Kalle, Remes Ulpu 2008

机译：缺失特征重建与声学模型自适应相结合，用于大词汇量连续语音识别

Combining semantic and acoustic features for valence and arousal recognition in speech

摘要

著录项

相似文献

相关主题

期刊订阅