首页> 外文会议>International Conference on speech and computer >Improving Speech-Based Emotion Recognition by Using Psychoacoustic Modeling and Analysis-by-Synthesis
【24h】

Improving Speech-Based Emotion Recognition by Using Psychoacoustic Modeling and Analysis-by-Synthesis

机译:通过心理声学建模和综合分析改善基于语音的情绪识别

获取原文
获取外文期刊封面目录资料

摘要

Most technical communication systems use speech compression codecs to save transmission bandwidth. A lot of development was made to guarantee a high speech intelligibility resulting in different compression techniques: Analysis-by-Synthesis, psychoacoustic modeling and a hybrid mode of both. Our first assumption is that the hybrid mode improves the speech intelligibility. But, enabling a natural spoken conversation also requires affective, namely emotional, information, contained in spoken language, to be intelligibly transmitted. Usually, compression methods are avoided for emotion recognition problems, as it is feared that compression degrades the acoustic characteristics needed for an accurate recognition [1]. By contrast, in our second assumption we state that the combination of psychoacoustic modeling and Analysis-by-Synthesis codecs could actually improve speech-based emotion recognition by removing certain parts of the acoustic signal that are considered "unnecessary", while still containing the full emotional information. To test both assumptions, we conducted an ITU-recommended POLQA measuring as well as several emotion recognition experiments employing two different datasets to verify the generality of this assumption. We compared our results on the hybrid mode with Analysis-by-Synthesis-only and psychoacoustic modeling-only codecs. The hybrid mode does not show remarkable differences regarding the speech intelligibility, but it outperforms all other compression settings in the multi-class emotion recognition experiments and achieves even an ~3.3% absolute higher performance than the uncompressed samples.
机译:大多数技术通信系统使用语音压缩编解码器来节省传输带宽。为了保证较高的语音清晰度,人们进行了大量的开发,从而产生了不同的压缩技术:综合分析,心理声学建模以及两者的混合模式。我们的第一个假设是混合模式可以提高语音清晰度。但是,要进行自然的口语对话,还需要以可理解的方式传达口头语言中包含的情感信息,即情感信息。通常,对于情绪识别问题,避免使用压缩方法,因为担心压缩会降低准确识别所需的声学特性[1]。相比之下,在我们的第二个假设中,我们指出,心理声学建模和“综合分析”编解码器的组合实际上可以通过删除声学信号中被认为“不必要”的某些部分来改善基于语音的情感识别,同时仍然包含全部情感信息。为了测试这两个假设,我们进行了ITU推荐的POLQA测量,并使用两个不同的数据集进行了几种情绪识别实验,以验证该假设的普遍性。我们将混合模式下的结果与仅通过综合分析和仅心理声学建模的编解码器进行了比较。混合模式在语音清晰度方面没有显示出显着差异,但是在多类情感识别实验中,混合模式的性能优于所有其他压缩设置,并且比未压缩样本的绝对性能高出甚至约3.3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号