Fusion Approaches for Emotion Recognition from Speech Using Acoustic and Text-Based Features

机译：融合基于语音和文本特征的语音情感识别方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we study different approaches for classifying emotions from speech using acoustic and text-based features. We propose to obtain contextualized word embeddings with BERT to represent the information contained in speech transcriptions and show that this results in better performance than using Glove embeddings. We also propose and compare different strategies to combine the audio and text modalities, evaluating them on IEMOCAP and MSPPODCAST datasets. We find that fusing acoustic and text-based systems is beneficial on both datasets, though only subtle differences are observed across the evaluated fusion approaches. Finally, for IEMOCAP, we show the large effect that the criteria used to define the cross-validation folds have on results. In particular, the standard way of creating folds for this dataset results in a highly optimistic estimation of performance for the text-based system, suggesting that some previous works may overestimate the advantage of incorporating transcriptions.

机译：在本文中，我们研究了使用声学和基于文本的功能对语音中的情感进行分类的不同方法。我们建议使用BERT获得上下文化的词嵌入，以表示语音转录中包含的信息，并表明与使用Glove嵌入相比，这种方法的性能更好。我们还提出并比较了将音频和文本模式相结合的不同策略，并在IEMOCAP和MSPPODCAST数据集上对其进行了评估。我们发现融合声学和基于文本的系统对两个数据集都是有益的，尽管在评估的融合方法中仅观察到了细微的差异。最后，对于IEMOCAP，我们证明了用于定义交叉验证折叠的标准对结果有很大的影响。特别是，为该数据集创建折叠的标准方法导致对基于文本的系统的性能进行高度乐观的估计，这表明某些先前的工作可能会高估合并转录的优势。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2020年|6484-6488|共5页
会议地点
作者
Leonardo Pepino; Pablo Riera; Luciana Ferrer; Agustín Gravano;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
speech emotion recognition; fusion; deep learning; BERT;

机译：语音情感识别;融合;深度学习; BERT;

相似文献

外文文献
中文文献
专利

1. 一种新的基于多核学习特征融合方法的语音情感识别方法 [J] . 金赟, 宋鹏, 郑文明, 东南大学学报（英文版） . 2013,第002期
2. An optimal two stage feature selection for speech emotion recognition using acoustic features [J] . Swarna Kuchibhotla, Hima Deepthi Vankayalapati, Koteswara Rao Anne International journal of speech technology . 2016,第4期

机译：使用声学特征的语音情感识别的最佳两阶段特征选择
3. Emotion recognition from speech using deep recurrent neural networks with acoustic features [J] . Byun Sung-Woo, Shin Bo-Ra, Lee Seok-Pil, Basic & clinical pharmacology & toxicology. . 2019,第S7期

机译：使用深度经常性神经网络具有声学特征的情感认识
4. Performance Analysis of Acoustic Features in Telugu Speech Emotion Recognition [J] . N. Ratna Kanth, S. Saraswathi International Journal of Applied Engineering Research . 2018,第17aPta2期

机译：泰卢邦语音情感认知中声学特征的性能分析
5. Fusion Approaches for Emotion Recognition from Speech Using Acoustic and Text-Based Features [C] . Leonardo Pepino, Pablo Riera, Luciana Ferrer, IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：使用声学和基于文本的功能言论认识的融合方法
6. Acoustic modeling and feature selection for speech recognition. [D] . Zheng, Yanli. 2005

机译：用于语音识别的声学建模和特征选择。
7. Time-Frequency Feature Representation Using Multi-Resolution Texture Analysis and Acoustic Activity Detector for Real-Life Speech Emotion Recognition [O] . Kun-Ching Wang 2015

机译：使用多分辨率纹理分析和声活动检测器的时频特征表示用于现实生活中的语音情感识别
8. On the Use of Self-Supervised Pre-Trained Acoustic and Linguistic Features for Continuous Speech Emotion Recognition [O] . Manon Macary, Marie Tahon, Yannick Esteve, 2021

机译：关于使用自我监督的预训练的声学和语言特征，用于连续语音情感识别

Fusion Approaches for Emotion Recognition from Speech Using Acoustic and Text-Based Features

摘要

著录项

相似文献

相关主题

期刊订阅