...
首页> 外文期刊>Information Processing & Management >Pyramid regional graph representation learning for content-based video retrieval
【24h】

Pyramid regional graph representation learning for content-based video retrieval

机译:基于内容的视频检索的金字塔区域图表表示学习

获取原文
获取原文并翻译 | 示例
           

摘要

Conventionally, it is common that video retrieval methods aggregate the visual feature representations from every frame as the feature of the video, where each frame is treated as an isolated, static image. Such methods lack the power of modeling the intra-frame and inter-frame relationships for the local regions, and are often vulnerable to the visual redundancy and noise caused by various types of video transformation and editing, such as adding image patches, adding banner, etc. From the perspective of video retrieval, a video's key information is more often than not convoyed by geometrically centered, dynamic visual content, and static areas often reside in regions that are farther from the center and often exhibit heavy visual redundancies temporally. This phenomenon is hardly investigated by conventional retrieval methods. In this article, we propose an unsupervised video retrieval method that simultaneously models intra-frame and inter-frame contextual information for video representation with a graph topology that is constructed on top of pyramid regional feature maps. By decomposing a frame into a pyramid regional sub-graph, and transforming a video into a regional graph, we use graph convolutional networks to extract features that incorporate information from multiple types of context. Our method is unsupervised and only uses the frame features extracted by pre-trained network. We have conducted extensive experiments and have demonstrated that the proposed method outperforms state-of-the-art video retrieval methods.
机译:传统上,常见的是,视频检索方法将视觉特征表示从每个帧聚合为视频的特征,其中每个帧被视为孤立的静态图像。此类方法缺乏对本地区域模拟帧内帧内帧间关系的力量,并且通常容易受到各种类型的视频转换和编辑引起的视觉冗余和噪声,例如添加图像贴片,添加横幅,从视频检索的角度来看,视频的关键信息通常是由几何上居中的,动态视觉内容的不归咎于静态区域,并且静态区域通常位于远离中心的区域,并且经常在时间上展示重视冗余。通过常规检索方法难以研究这种现象。在本文中,我们提出了一种无监督的视频检索方法,其同时模拟帧内帧内和帧间上下文信息,用于使用在Pyramid区域特征映射的顶部构建的图形拓扑结构。通过将帧分解成金字塔区域子图,并将视频转换为区域图形,我们使用图表卷积网络来提取包含来自多种类型的上下文信息的功能。我们的方法是无监督的,只使用预先训练的网络提取的帧特征。我们已经进行了广泛的实验,并证明了所提出的方法优于最先进的视频检索方法。

著录项

  • 来源
    《Information Processing & Management》 |2021年第3期|102488.1-102488.12|共12页
  • 作者单位

    School of Information Renmin University of China Beijing China;

    School of Information Renmin University of China Beijing China;

    School of Information Renmin University of China Beijing China;

    School of Information Renmin University of China Beijing China Beijing Key Laboratory of Big Data Management and Analysis Methods Beijing China Data 61 CSIRO Pullenvale Australia;

    School of Information Renmin University of China Beijing China;

    School of Information Renmin University of China Beijing China Beijing Key Laboratory of Big Data Management and Analysis Methods Beijing China Gaoling School of Artificial Intelligence Renmin University of China Beijing China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Graph embedding; Video retrieval; Regional graph; Pyramid feature map;

    机译:图形嵌入;视频检索;区域图;金字塔特征地图;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号