...
首页> 外文期刊>Quality Control, Transactions >Mutual Complementarity: Multi-Modal Enhancement Semantic Learning for Micro-Video Scene Recognition
【24h】

Mutual Complementarity: Multi-Modal Enhancement Semantic Learning for Micro-Video Scene Recognition

机译:互补性:微型视频场景识别的多模态增强语义学习

获取原文
获取原文并翻译 | 示例
           

摘要

Scene recognition is one of the hot topics in micro-video understanding, where multi-modal information is commonly used due to its efficient representation ability. However, there are some challenges in the usage of multi-modal information because the semantic consistency among multiple modalities in micro-videos is weaker than in traditional videos, and the influences of multi-modal information in micro-videos are always different. To address these issues, a multi-modal enhancement semantic learning method is proposed for micro-video scene recognition in this study. In the proposed method, the visual modality is considered the main modality whereas other modalities such as text and audio are considered auxiliary modalities. We propose a deep multi-modal fusion network for scene recognition with enhanced the semantics of auxiliary modalities using the main modality. Furthermore, the fusion weight of multi-modal can be adaptively learned in the proposed method. The experiments demonstrate the effectiveness of enhancement and adaptive weight learning in the multi-modal fusion of the micro-video scene recognition.
机译:场景识别是微观视频理解中的热门话题之一,其中由于其有效的表示能力,通常使用多模态信息。然而,在多模态信息的使用情况下存在一些挑战,因为微视频中多种模式之间的语义一致性比传统视频中的多种方式较弱,并且微型视频中的多模态信息的影响总是不同。为了解决这些问题,提出了一种在本研究中进行微观视频场景识别的多模态增强语义学习方法。在所提出的方法中,视觉模态被认为是主模式,而诸如文本和音频之类的其他模态被认为是辅助模式。我们为场景识别提出了一个深度多模态融合网络,通过主要模态增强了辅助方式的语义。此外,可以以所提出的方法自适应地学习多模态的熔合重量。实验证明了在微观视频场景识别的多模态融合中提高和自适应重量学习的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号