Pyramid regional graph representation learning for content-based video retrieval

Guoping Zhao; Mingyu Zhang; Yaxian Li; Jiajun Liu; Bingqing Zhang; Ji-Rong Wen

首页> 外文期刊>Information Processing & Management >Pyramid regional graph representation learning for content-based video retrieval

【24h】

Pyramid regional graph representation learning for content-based video retrieval

机译：基于内容的视频检索的金字塔区域图表表示学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Conventionally, it is common that video retrieval methods aggregate the visual feature representations from every frame as the feature of the video, where each frame is treated as an isolated, static image. Such methods lack the power of modeling the intra-frame and inter-frame relationships for the local regions, and are often vulnerable to the visual redundancy and noise caused by various types of video transformation and editing, such as adding image patches, adding banner, etc. From the perspective of video retrieval, a video's key information is more often than not convoyed by geometrically centered, dynamic visual content, and static areas often reside in regions that are farther from the center and often exhibit heavy visual redundancies temporally. This phenomenon is hardly investigated by conventional retrieval methods. In this article, we propose an unsupervised video retrieval method that simultaneously models intra-frame and inter-frame contextual information for video representation with a graph topology that is constructed on top of pyramid regional feature maps. By decomposing a frame into a pyramid regional sub-graph, and transforming a video into a regional graph, we use graph convolutional networks to extract features that incorporate information from multiple types of context. Our method is unsupervised and only uses the frame features extracted by pre-trained network. We have conducted extensive experiments and have demonstrated that the proposed method outperforms state-of-the-art video retrieval methods.

机译：传统上，常见的是，视频检索方法将视觉特征表示从每个帧聚合为视频的特征，其中每个帧被视为孤立的静态图像。此类方法缺乏对本地区域模拟帧内帧内帧间关系的力量，并且通常容易受到各种类型的视频转换和编辑引起的视觉冗余和噪声，例如添加图像贴片，添加横幅，从视频检索的角度来看，视频的关键信息通常是由几何上居中的，动态视觉内容的不归咎于静态区域，并且静态区域通常位于远离中心的区域，并且经常在时间上展示重视冗余。通过常规检索方法难以研究这种现象。在本文中，我们提出了一种无监督的视频检索方法，其同时模拟帧内帧内和帧间上下文信息，用于使用在Pyramid区域特征映射的顶部构建的图形拓扑结构。通过将帧分解成金字塔区域子图，并将视频转换为区域图形，我们使用图表卷积网络来提取包含来自多种类型的上下文信息的功能。我们的方法是无监督的，只使用预先训练的网络提取的帧特征。我们已经进行了广泛的实验，并证明了所提出的方法优于最先进的视频检索方法。

著录项

来源
《Information Processing & Management》 |2021年第3期|102488.1-102488.12|共12页
作者
Guoping Zhao; Mingyu Zhang; Yaxian Li; Jiajun Liu; Bingqing Zhang; Ji-Rong Wen;
展开▼
作者单位

School of Information Renmin University of China Beijing China;

School of Information Renmin University of China Beijing China;

School of Information Renmin University of China Beijing China;

School of Information Renmin University of China Beijing China Beijing Key Laboratory of Big Data Management and Analysis Methods Beijing China Data 61 CSIRO Pullenvale Australia;

School of Information Renmin University of China Beijing China;

School of Information Renmin University of China Beijing China Beijing Key Laboratory of Big Data Management and Analysis Methods Beijing China Gaoling School of Artificial Intelligence Renmin University of China Beijing China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Graph embedding; Video retrieval; Regional graph; Pyramid feature map;

机译：图形嵌入;视频检索;区域图;金字塔特征地图;

相似文献

外文文献
中文文献
专利

1. A fuzzy video content representation for video summarization and content-based retrieval [J] . Anastasios D. Doulamis, Nikolaos D. Dou1amis, Stefanos D. Kollias Signal processing . 2000,第6期

机译：用于视频摘要和基于内容的检索的模糊视频内容表示
2. MST-CSS (Multi-Spectro-Temporal Curvature Scale Space), a Novel Spatio-Temporal Representation for Content-Based Video Retrieval [J] . Dyana A., Das S. Circuits and Systems for Video Technology, IEEE Transactions on . 2010,第8期

机译：MST-CSS（多光谱时间曲率标度空间），一种基于内容的视频检索的新型时空表示
3. Deep learning for content-based video retrieval in film and television production [J] . Muehling Markus, Korfhage Nikolaus, Mueller Eric, Multimedia Tools and Applications . 2017,第21期

机译：影视制作中基于内容的视频检索的深度学习
4. A New Representation System for Content-Based Image Retrieval: The Pyramidal Graph [C] . Julien Dombre, Noel Richard, Christine Fernandez-Maloigne European conference on colour in graphics, imaging, and vision . 2002

机译：基于内容的图像检索的新表示系统：金字塔图
5. Learning, detection, representation, indexing and retrieval of multi-agent events in videos. [D] . Hakeem, Asaad. 2007

机译：视频中多主体事件的学习，检测，表示，索引和检索。
6. Imaging of Functional Brain Circuits during Acquisition and Memory Retrieval in an Aversive Feedback Learning Task: Single Photon Emission Computed Tomography of Regional Cerebral Blood Flow in Freely Behaving Rats [O] . Katharina Braun, Anja Mannewitz, Joerg Bock, 2021

机译：在厌恶反馈学习中获取和记忆检索期间功能性脑电路的成像：自由表现大鼠的唯一光子发射计算区域脑血流的层析成像
7. A fuzzy video content representation for video summarization and content-based retrieval [O] . Doulamis AD, Doulamis ND, Kollias SD 2000

机译：用于视频摘要和基于内容的检索的模糊视频内容表示

Pyramid regional graph representation learning for content-based video retrieval

摘要

著录项

相似文献

相关主题

期刊订阅