首页> 外文会议>International Conference on Computer Vision >TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection
【24h】

TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection

机译:TASED-Net:用于视频显着性检测的临时聚合空间编解码器网络

获取原文

摘要

TASED-Net is a 3D fully-convolutional network architecture for video saliency detection. It consists of two building blocks: first, the encoder network extracts low-resolution spatiotemporal features from an input clip of several consecutive frames, and then the following prediction network decodes the encoded features spatially while aggregating all the temporal information. As a result, a single prediction map is produced from an input clip of multiple frames. Frame-wise saliency maps can be predicted by applying TASED-Net in a sliding-window fashion to a video. The proposed approach assumes that the saliency map of any frame can be predicted by considering a limited number of past frames. The results of our extensive experiments on video saliency detection validate this assumption and demonstrate that our fully-convolutional model with temporal aggregation method is effective. TASED-Net significantly outperforms previous state-of-the-art approaches on all three major large-scale datasets of video saliency detection: DHF1K, Hollywood2, and UCFSports. After analyzing the results qualitatively, we observe that our model is especially better at attending to salient moving objects.
机译:TASED-Net是用于视频显着性检测的3D全卷积网络体系结构。它由两个组成部分组成:首先,编码器网络从几个连续帧的输入剪辑中提取低分辨率的时空特征,然后,随后的预测网络在汇总所有时间信息的同时,对已编码的特征进行空间解码。结果,从多个帧的输入剪辑产生单个预测图。通过以滑动窗口方式将TASED-Net应用于视频,可以预测逐帧显着性图。所提出的方法假设可以通过考虑有限数量的过去帧来预测任何帧的显着性图。我们在视频显着性检测方面的大量实验结果验证了这一假设,并证明了采用时间聚合方法的全卷积模型是有效的。在视频显着性检测的所有三个主要大型数据集上,TASED-Net的性能均大大优于以前的最新方法:DHF1K,Hollywood2和UCFSports。在对结果进行定性分析后,我们观察到我们的模型特别适合于突出运动物体。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号