Improving Speech-Based Emotion Recognition by Using Psychoacoustic Modeling and Analysis-by-Synthesis

机译：通过心理声学建模和综合分析改善基于语音的情绪识别

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Most technical communication systems use speech compression codecs to save transmission bandwidth. A lot of development was made to guarantee a high speech intelligibility resulting in different compression techniques: Analysis-by-Synthesis, psychoacoustic modeling and a hybrid mode of both. Our first assumption is that the hybrid mode improves the speech intelligibility. But, enabling a natural spoken conversation also requires affective, namely emotional, information, contained in spoken language, to be intelligibly transmitted. Usually, compression methods are avoided for emotion recognition problems, as it is feared that compression degrades the acoustic characteristics needed for an accurate recognition [1]. By contrast, in our second assumption we state that the combination of psychoacoustic modeling and Analysis-by-Synthesis codecs could actually improve speech-based emotion recognition by removing certain parts of the acoustic signal that are considered "unnecessary", while still containing the full emotional information. To test both assumptions, we conducted an ITU-recommended POLQA measuring as well as several emotion recognition experiments employing two different datasets to verify the generality of this assumption. We compared our results on the hybrid mode with Analysis-by-Synthesis-only and psychoacoustic modeling-only codecs. The hybrid mode does not show remarkable differences regarding the speech intelligibility, but it outperforms all other compression settings in the multi-class emotion recognition experiments and achieves even an ~3.3% absolute higher performance than the uncompressed samples.

机译：大多数技术通信系统使用语音压缩编解码器来节省传输带宽。为了保证较高的语音清晰度，人们进行了大量的开发，从而产生了不同的压缩技术：综合分析，心理声学建模以及两者的混合模式。我们的第一个假设是混合模式可以提高语音清晰度。但是，要进行自然的口语对话，还需要以可理解的方式传达口头语言中包含的情感信息，即情感信息。通常，对于情绪识别问题，避免使用压缩方法，因为担心压缩会降低准确识别所需的声学特性[1]。相比之下，在我们的第二个假设中，我们指出，心理声学建模和“综合分析”编解码器的组合实际上可以通过删除声学信号中被认为“不必要”的某些部分来改善基于语音的情感识别，同时仍然包含全部情感信息。为了测试这两个假设，我们进行了ITU推荐的POLQA测量，并使用两个不同的数据集进行了几种情绪识别实验，以验证该假设的普遍性。我们将混合模式下的结果与仅通过综合分析和仅心理声学建模的编解码器进行了比较。混合模式在语音清晰度方面没有显示出显着差异，但是在多类情感识别实验中，混合模式的性能优于所有其他压缩设置，并且比未压缩样本的绝对性能高出甚至约3.3％。

著录项

来源
《International Conference on speech and computer》|2017年|445-455|共11页
会议地点
作者
Ingo Siegert; Alicia Flores Lotz; Olga Egorow; Andreas Wendemuth;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Automatic emotion recognition; Speech compression; Intelligibility of affective speech;

机译：自动情感识别;语音压缩情感表达的可理解性;

相似文献

外文文献
中文文献
专利

1. Multi-Objective Heuristic Feature Selection for Speech-Based Multilingual Emotion Recognition [J] . Christina Brester, Eugene Semenkin, Maxim Sidorov Journal of Artificial Intelligence and Soft Computing Research . 2016,第4期

机译：基于语音的多语言情感识别的多目标启发式特征选择
2. Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis [J] . Loic Kessous, Ginevra Castellano, George Caridakis Journal on multimodal user interfaces . 2010,第1a2期

机译：基于表情，身体手势和声学分析的基于语音的交互中的多模式情感识别
3. Segmenting into Adequate Units for Automatic Recognition of Emotion-Related Episodes: A Speech-Based Approach [J] . AntonBatliner, DinoSeppi, StefanSteidl, Advances in human-computer interaction . 2010,第1期

机译：分割成足够的单元以自动识别与情感相关的情节：基于语音的方法
4. Improving Speech-Based Emotion Recognition by Using Psychoacoustic Modeling and Analysis-by-Synthesis [C] . Ingo Siegert, Alicia Flores Lotz, Olga Egorow, International Conference on Speech and Computer . 2017

机译：通过使用心理声学建模和逐合作改善基于语音的情感识别
5. An Augmentative System with Facial and Emotion Recognition for Improving the Skills of Children with Autism Spectrum Disorders [D] . Alharbi, Mohammed N. 2018

机译：具有面部和情绪识别的增强系统，可提高自闭症谱系障碍儿童的技能
6. Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels [O] . Santiago-Omar Caballero-Morales 2013

机译：墨西哥西班牙语语音中的情绪识别：一种基于情绪特定元音声学模型的方法
7. Speech-based recognition of self-reported and observed emotion in a dimensional space [O] . Truong Khiet P., Leeuwen David A. van, Jong Franciska M.G. de 2012

机译：基于语音识别维度空间中自我报告和观察到的情绪
8. How Deep Neural Networks Can Improve Emotion Recognition on Video Data. [R] . Brady, K., Dagli, C., Khorrami, P., 2016

机译：深度神经网络如何改善视频数据的情感识别。

Improving Speech-Based Emotion Recognition by Using Psychoacoustic Modeling and Analysis-by-Synthesis

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅