首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Fusion Approaches for Emotion Recognition from Speech Using Acoustic and Text-Based Features
【24h】

Fusion Approaches for Emotion Recognition from Speech Using Acoustic and Text-Based Features

机译:融合基于语音和文本特征的语音情感识别方法

获取原文

摘要

In this paper, we study different approaches for classifying emotions from speech using acoustic and text-based features. We propose to obtain contextualized word embeddings with BERT to represent the information contained in speech transcriptions and show that this results in better performance than using Glove embeddings. We also propose and compare different strategies to combine the audio and text modalities, evaluating them on IEMOCAP and MSPPODCAST datasets. We find that fusing acoustic and text-based systems is beneficial on both datasets, though only subtle differences are observed across the evaluated fusion approaches. Finally, for IEMOCAP, we show the large effect that the criteria used to define the cross-validation folds have on results. In particular, the standard way of creating folds for this dataset results in a highly optimistic estimation of performance for the text-based system, suggesting that some previous works may overestimate the advantage of incorporating transcriptions.
机译:在本文中,我们研究了使用声学和基于文本的功能对语音中的情感进行分类的不同方法。我们建议使用BERT获得上下文化的词嵌入,以表示语音转录中包含的信息,并表明与使用Glove嵌入相比,这种方法的性能更好。我们还提出并比较了将音频和文本模式相结合的不同策略,并在IEMOCAP和MSPPODCAST数据集上对其进行了评估。我们发现融合声学和基于文本的系统对两个数据集都是有益的,尽管在评估的融合方法中仅观察到了细微的差异。最后,对于IEMOCAP,我们证明了用于定义交叉验证折叠的标准对结果有很大的影响。特别是,为该数据集创建折叠的标准方法导致对基于文本的系统的性能进行高度乐观的估计,这表明某些先前的工作可能会高估合并转录的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号