Getting the subtext without the text: Scalable multimodal sentiment classification from visual and acoustic modalities

机译：在没有文本的情况下获取潜台词：从视觉和听觉模态进行可扩展的多模态情感分类

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In the last decade, video blogs (vlogs) have become an extremely popular method through which people express sentiment. The ubiquitousness of these videos has increased the importance of multimodal fusion models, which incorporate video and audio features with traditional text features for automatic sentiment detection. Multimodal fusion offers a unique opportunity to build models that learn from the full depth of expression available to human viewers. In the detection of sentiment in these videos, acoustic and video features provide clarity to otherwise ambiguous transcripts. In this paper, we present a multimodal fusion model that exclusively uses high-level video and audio features to analyze spoken sentences for sentiment. We discard traditional transcription features in order to minimize human intervention and to maximize the deployabil-ity of our model on at-scale real-world data. We select high-level features for our model that have been successful in non-affect domains in order to test their gen-eralizability in the sentiment detection domain. We train and test our model on the newly released CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) dataset, obtaining an F_1 score of 0.8049 on the validation set and an F_1 score of 0.6325 on the held-out challenge test set.

机译：在过去的十年中，视频博客（vlog）已成为人们表达情感的一种非常流行的方法。这些视频的无处不在增加了多模式融合模型的重要性，该模型将视频和音频功能与传统文本功能结合在一起，用于自动情感检测。多模式融合为构建模型提供了独特的机会，该模型可从人类观看者可以使用的完整表达深度中学习。在检测这些视频中的情绪时，声音和视频功能可以清晰地显示其他模糊的笔录。在本文中，我们提出了一种多模式融合模型，该模型仅使用高级视频和音频功能来分析口头表达的情感。我们舍弃传统的转录功能，以最大程度地减少人工干预，并最大程度地提高模型在大规模实际数据中的可部署性。我们为模型选择了在非影响域中成功的高级功能，以便在情感检测域中测试其泛化能力。我们在新发布的CMU多模态观点情感和情绪强度（CMU-MOSEI）数据集上训练和测试模型，在验证集上获得的F_1得分为0.8049，在挑战测试集上获得的F_1得分为0.6325。

著录项

来源
《First grand challenge and workshop on human multimodal language 2018》|2018年|1-10|共10页
会议地点 Melbourne(AU)
作者
Nathaniel Blanchard; Aparna Bharati; Daniel Moreira; Walter J. Scheirer;
展开▼
作者单位

Dept. of Comp. Sci. and Eng. University of Notre Dame, USA;

Dept. of Comp. Sci. and Eng. University of Notre Dame, USA;

Dept. of Comp. Sci. and Eng. University of Notre Dame, USA;

Dept. of Comp. Sci. and Eng. University of Notre Dame, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Hindi EmotionNet: A Scalable Emotion Lexicon for Sentiment Classification of Hindi Text [J] . Garg Kanika, Lobiyal D. K. ACM transactions on Asian and low-resource language information processing . 2020,第4期

机译：Hindi Emotionnet：一种可扩展的情感词典，用于印地语文本的情绪分类
2. Semantic-rebased cross-modal hashing for scalable unsupervised text-visual retrieval [J] . Weiwei Wang, Yuming Shen, Haofeng Zhang, Information Processing & Management . 2020,第6期

机译：用于可扩展无监督文本检索的语义重新折叠跨模型散列
3. An Improved Approach for Text Sentiment Classification Based on a Deep Neural Network via a Sentiment Attention Mechanism [J] . Wenkuan Li, Peiyu Liu, Qiuyue Zhang, Future Internet . 2019,第4期

机译：一种基于深度神经网络的情感注意机制的文本情感分类改进方法
4. Getting the subtext without the text: Scalable multimodal sentiment classification from visual and acoustic modalities [C] . Nathaniel Blanchard, Aparna Bharati, Daniel Moreira, Annual meeting of the Association for Computational Linguistics . 2018

机译：获取未经文本的子文本：从视觉和声学模式的可扩展多模式情绪分类
5. Improving Sentiment Classification for Arabic Short Text Using Deep Learning Approaches [D] . Alwehaibi, Ali. 2021

机译：利用深度学习方法改善阿拉伯语短文本的情感分类
6. Interactive Dual Attention Network for Text Sentiment Classification [O] . Yinglin Zhu, Wenbin Zheng, Hong Tang 2020

机译：文本情绪分类的互动双重关注网络
7. Getting the subtext without the text: Scalable multimodal sentiment classification from visual and acoustic modalities [O] . Nathaniel Blanchard, Daniel Moreira, Aparna Bharati, 2018

机译：获取未经文本的子文本：从视觉和声学模式的可扩展多模式情绪分类

Getting the subtext without the text: Scalable multimodal sentiment classification from visual and acoustic modalities

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅