Recurrent Temporal Aggregation Framework for Deep Video Inpainting

Kim Dahun; Woo Sanghyun; Lee Joon-Young; Kweon In So

首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Recurrent Temporal Aggregation Framework for Deep Video Inpainting

【24h】

Recurrent Temporal Aggregation Framework for Deep Video Inpainting

机译：深度视频侵略的经常性时间聚合框架

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Video inpainting aims to fill in spatio-temporal holes in videos with plausible content. Despite tremendous progress on deep learning-based inpainting of a single image, it is still challenging to extend these methods to video domain due to the additional time dimension. In this paper, we propose a recurrent temporal aggregation framework for fast deep video inpainting. In particular, we construct an encoder-decoder model, where the encoder takes multiple reference frames which can provide visible pixels revealed from the scene dynamics. These hints are aggregated and fed into the decoder. We apply a recurrent feedback in an auto-regressive manner to enforce temporal consistency in the video results. We propose two architectural designs based on this framework. Our first model is a blind video decaptioning network (BVDNet) that is designed to automatically remove and inpaint text overlays in videos without any mask information. Our BVDNet wins the first place in the ECCV Chalearn 2018 LAP Inpainting Competition Track 2: Video Decaptioning. Second, we propose a network for more general video inpainting (VINet) to deal with more arbitrary and larger holes. Video results demonstrate the advantage of our framework compared to state-of-the-art methods both qualitatively and quantitatively. The codes are available at https://github.com/mcahny/Deep-Video-Inpainting, and https://github.com/shwoo93/video_decaptioning.

机译：视频污染旨在填充具有合理内容的视频中的时空孔。尽管基于深度学习的初始图像的巨大进展，但由于额外的时间尺寸将这些方法扩展到视频域中仍然具有挑战性。在本文中，我们提出了一种用于快速深度视频染色的经常性时间聚合框架。特别地，我们构造了一个编码器解码器模型，其中编码器采用多个参考帧，该参考帧可以提供从场景动态显示的可见像素。这些提示被聚合并进入解码器。我们以自动回归方式应用复发反馈，以强制视频结果中的时间一致性。我们提出了基于此框架的两种建筑设计。我们的第一款模型是盲目视频兼容网络（BVDNet），旨在自动删除和在没有任何掩码信息的视频中覆盖和inpaint文本。我们的BVDNet赢得了ECCV Chalearn 2018 Lap Inpainting竞赛赛道2：视频剥离。其次，我们提出了一个网络，以便更广泛的视频浸染（VINET）来处理更多的任意和更大的孔。视频结果展示了我们的框架的优势与定性和定量的最先进的方法相比。该代码可在https://github.com/mcahny/dep-video-inpainting和https://github.com/shwoo93/video_decaptioning。

著录项

来源
《IEEE Transactions on Pattern Analysis and Machine Intelligence》 |2020年第5期|1038-1052|共15页
作者
Kim Dahun; Woo Sanghyun; Lee Joon-Young; Kweon In So;
展开▼
作者单位

Korea Adv Inst Sci & Technol Daejeon 34141 South Korea;

Korea Adv Inst Sci & Technol Daejeon 34141 South Korea;

Adobe Res San Jose CA 95110 USA;

Korea Adv Inst Sci & Technol Daejeon 34141 South Korea;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Video inpainting; video completion; video object removal; video caption removal; video decaptioning; video editing;

机译：视频染色;视频完成;视频对象删除;视频字幕删除;视频兼容;视频编辑;

相似文献

外文文献
中文文献
专利

1. A deep learning spatial-temporal framework for detecting surgical tools in laparoscopic videos [J] . Alshirbaji Tamer Abdulbaki, Jalal Nour Aldeen, Docherty Paul D., Biomedical signal processing and control . 2021,第Pta2期

机译：腹腔镜视频中检测手术工具的深层学习空间框架
2. Automatic content understanding with cascaded spatial-temporal deep framework for capsule endoscopy videos [J] . Chen Honghan, Wu Xiao, Tao Gan, Neurocomputing . 2017,第MARa15期

机译：通过级联的时空深层框架自动了解胶囊内窥镜视频的内容
3. Auto-inpainting heritage scenes: a complete framework for detecting and infilling cracks in images and videos with quantitative assessment [J] . Milind G. Padalkar, Manjunath V. Joshi Machine Vision and Applications . 2015,第2a3期

机译：自动修复遗产场景：通过定量评估来检测和填充图像和视频中裂缝的完整框架
4. Video Based Person Re-Identification by Re-Ranking Attentive Temporal Information in Deep Recurrent Convolutional Networks [C] . Bhaswati Saha, K. Sai Ram, Jayanta Mukhopadhyay, IEEE International Conference on Image Processing . 2018

机译：深度递归卷积网络中通过重新排列注意力时间信息来基于视频的人员重新识别
5. Object Recognition in Videos Utilizing Hierarchical and Temporal Objectness with Deep Neural Networks. [D] . Peng, Liang. 2017

机译：利用具有深度神经网络的分层和时间对象性的视频中的对象识别。
6. Deep-Framework: A Distributed Scalable and Edge-Oriented Framework for Real-Time Analysis of Video Streams [O] . Alessandro Sassu, Jose Francisco Saenz-Cogollo, Maurizio Agelli 2021

机译：深度框架：用于视频流的实时分析的分布式可扩展和边缘导向框架
7. A Deep Spatial and Temporal Aggregation Framework for Video-Based Facial Expression Recognition [O] . Xianzhang Pan, Guoliang Ying, Guodong Chen, 2019

机译：基于视频的面部表情识别的深空间和时间聚合框架
8. Natural Language Video Description using Deep Recurrent Neural Networks. [R] . Venugopalan, S. 2015

机译：使用深度递归神经网络的自然语言视频描述。

Recurrent Temporal Aggregation Framework for Deep Video Inpainting

摘要

著录项

相似文献

相关主题

期刊订阅