Mutual Complementarity: Multi-Modal Enhancement Semantic Learning for Micro-Video Scene Recognition

Guo Jie; Nie Xiushan; Yin Yilong

首页> 外文期刊>Quality Control, Transactions >Mutual Complementarity: Multi-Modal Enhancement Semantic Learning for Micro-Video Scene Recognition

【24h】

Mutual Complementarity: Multi-Modal Enhancement Semantic Learning for Micro-Video Scene Recognition

机译：互补性：微型视频场景识别的多模态增强语义学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Scene recognition is one of the hot topics in micro-video understanding, where multi-modal information is commonly used due to its efficient representation ability. However, there are some challenges in the usage of multi-modal information because the semantic consistency among multiple modalities in micro-videos is weaker than in traditional videos, and the influences of multi-modal information in micro-videos are always different. To address these issues, a multi-modal enhancement semantic learning method is proposed for micro-video scene recognition in this study. In the proposed method, the visual modality is considered the main modality whereas other modalities such as text and audio are considered auxiliary modalities. We propose a deep multi-modal fusion network for scene recognition with enhanced the semantics of auxiliary modalities using the main modality. Furthermore, the fusion weight of multi-modal can be adaptively learned in the proposed method. The experiments demonstrate the effectiveness of enhancement and adaptive weight learning in the multi-modal fusion of the micro-video scene recognition.

机译：场景识别是微观视频理解中的热门话题之一，其中由于其有效的表示能力，通常使用多模态信息。然而，在多模态信息的使用情况下存在一些挑战，因为微视频中多种模式之间的语义一致性比传统视频中的多种方式较弱，并且微型视频中的多模态信息的影响总是不同。为了解决这些问题，提出了一种在本研究中进行微观视频场景识别的多模态增强语义学习方法。在所提出的方法中，视觉模态被认为是主模式，而诸如文本和音频之类的其他模态被认为是辅助模式。我们为场景识别提出了一个深度多模态融合网络，通过主要模态增强了辅助方式的语义。此外，可以以所提出的方法自适应地学习多模态的熔合重量。实验证明了在微观视频场景识别的多模态融合中提高和自适应重量学习的有效性。

著录项

来源
《Quality Control, Transactions》 |2020年第2020期|29518-29524|共7页
作者
Guo Jie; Nie Xiushan; Yin Yilong;
展开▼
作者单位

Shandong Univ Sch Comp Sci & Technol Jinan 250101 Peoples R China;

Shandong Jianzhu Univ Sch Comp Sci & Technol Jinan 250101 Peoples R China;

Shandong Univ Sch Software Jinan 250101 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Micro-video scene recognition; multi-modal fusion; semantic enhancement; adaptive weight learning;

机译：微视频场景识别;多模态融合;语义增强;自适应重量学习;

相似文献

外文文献
中文文献
专利

1. Binary feature representation learning for scene retrieval in micro-video [J] . Guo Jie, Nie Xiushan, Jian Muwei, Multimedia Tools and Applications . 2019,第17期

机译：用于微视频场景检索的二进制特征表示学习
2. Binary feature representation learning for scene retrieval in micro-video [J] . Guo Jie, Nie Xiushan, Jian Muwei, Multimedia Tools and Applications . 2019,第17期

机译：微型视频中场景检索的二进制特征表示学习
3. Visual Semantic-Based Representation Learning Using Deep CNNs for Scene Recognition [J] . Gupta Shikha, Sharma Krishan, Dinesh Dileep Aroor, ACM transactions on multimedia computing communications and applications . 2021,第2期

机译：基于视觉语义的表示学习使用深CNN进行场景识别
4. Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances [C] . Thao Le Minh, Nobuyuki Shimizu, Takashi Miyazaki, International Joint Conference on Artificial Intelligence . 2018

机译：话语基于深度学习的多模态收纳识别与话语的视觉场景
5. A neural model of scene understanding: Multiple-scale spatial and feature-based attention in scene search, learning, and recognition. [D] . Huang, Tsung-Ren. 2010

机译：场景理解的神经模型：场景搜索，学习和识别中多尺度基于空间和基于特征的注意力。
6. Visual Scene-Aware Hybrid and Multi-Modal Feature Aggregation for Facial Expression Recognition [O] . Min Kyu Lee, Dae Ha Kim, Byung Cheol Song 2020

机译：面部表情识别的视觉场景感知混合和多模态特征聚合
7. RGB-D Scene Recognition via Spatial-Related Multi-Modal Feature Learning [O] . Zhitong Xiong, Yuan Yuan, Qi Wang 2019

机译：RGB-D通过空间相关的多模态特征学习识别

Mutual Complementarity: Multi-Modal Enhancement Semantic Learning for Micro-Video Scene Recognition

摘要

著录项

相似文献

相关主题

期刊订阅