...
首页> 外文期刊>Neural Networks: The Official Journal of the International Neural Network Society >A multimodal convolutional neuro-fuzzy network for emotion understanding of movie clips
【24h】

A multimodal convolutional neuro-fuzzy network for emotion understanding of movie clips

机译:一种多模式卷积神经模糊网络,用于电影剪辑的情感理解

获取原文
获取原文并翻译 | 示例
           

摘要

Multimodal emotion understanding enables AI systems to interpret human emotions. With accelerated video surge, emotion understanding remains challenging due to inherent data ambiguity and diversity of video content. Although deep learning has made a considerable progress in big data feature learning, they are viewed as deterministic models used in a "black-box" manner which does not have capabilities to represent inherent ambiguities with data. Since the possibility theory of fuzzy logic focuses on knowledge representation and reasoning under uncertainty, we intend to incorporate the concepts of fuzzy logic into deep learning framework. This paper presents a novel convolutional neuro-fuzzy network, which is an integration of convolutional neural networks in fuzzy logic domain to extract high-level emotion features from text, audio, and visual modalities. The feature sets extracted by fuzzy convolutional layers are compared with those of convolutional neural networks at the same level using t-distributed Stochastic Neighbor Embedding. This paper demonstrates a multimodal emotion understanding framework with an adaptive neural fuzzy inference system that can generate new rules to classify emotions. For emotion understanding of movie clips, we concatenate audio, visual, and text features extracted using the proposed convolutional neuro-fuzzy network to train adaptive neural fuzzy inference system. In this paper, we go one step further to explain how deep learning arrives at a conclusion that can guide us to an interpretable AI. To identify which visual/text/audio aspects are important for emotion understanding, we use direct linear non-Gaussian additive model to explain the relevance in terms of causal relationships between features of deep hidden layers. The critical features extracted are input to the proposed multimodal framework to achieve higher accuracy. (C) 2019 Elsevier Ltd. All rights reserved.
机译:多模式情绪理解使AI系统能够解释人类的情绪。通过加速视频激增,情绪理解仍然是挑战,由于内涵和视频内容的多样性。虽然深入学习在大数据特征学习中取得了相当大的进展,但它们被视为以“黑匣子”方式使用的确定性模型,这些模型不具有代表数据表示固有的歧义的功能。由于模糊逻辑的可能性理论侧重于在不确定性下的知识表示和推理,我们打算将模糊逻辑的概念纳入深度学习框架。本文提出了一种新颖的卷积神经模糊网络,它是模糊逻辑域中的卷积神经网络的集成,以从文本,音频和视觉模式中提取高级情感特征。使用模糊卷积层提取的特征集与使用T分布式随机邻嵌入的相同水平的卷积神经网络中提取的特征集。本文展示了具有自适应神经模糊推理系统的多模式情感理解框架,可以生成新规则来分类情绪。对于对电影剪辑的情感理解,我们使用所提出的卷积神经模糊网络提取的音频,视觉和文本特征来培训自适应神经模糊推理系统。在这篇论文中,我们进一步走了一步,解释了深度学习如何到达的结论,可以指导我们解释一个可解释的AI。为了确定哪些视觉/文本/音频方面对于情感理解很重要,我们使用直线性非高斯添加剂模型来解释在深隐藏层的特征之间的因果关系方面的相关性。提取的关键特征被输入到所提出的多模型框架,以实现更高的精度。 (c)2019年elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号