【24h】

A Deep Ranking Model for Spatio-Temporal Highlight Detection from a 360° Video

机译:360°视频的时空突出检测的深度排名模型

获取原文

摘要

We address the problem of highlight detection from a 360° video by summarizing it both spatially and temporally. Given a long 360° video, we spatially select pleasantly-looking normal field-of-view (NFOV) segments from unlimited field of views (FOV) of the 360° video, and temporally summarize it into a concise and informative highlight as a selected subset of subshots. We propose a novel deep ranking model named as Composition View Score (CVS) model, which produces a spherical score map of composition per video segment, and determines which view is suitable for highlight via a sliding window kernel at inference. To evaluate the proposed framework, we perform experiments on the Pano2Vid benchmark dataset (Su, Jayaraman, and Grauman 2016) and our newly collected 360° video highlight dataset from YouTube and Vimeo. Through evaluation using both quantitative summarization metrics and user studies via Amazon Mechanical Turk, we demonstrate that our approach outperforms several state-of-the-art highlight detection methods. We also show that our model is 16 times faster at inference than AutoCam (Su, Jayaraman, and Grauman 2016), which is one of the first summarization algorithms of 360° videos.
机译:通过在空间和时间汇总,我们通过总结它来解决360°Video的突出显示检测问题。给定长360°视频,我们在360°视频的无限视野(FOV)中,在空间上选择令人愉快的正常视野(NFOV)段,并将其逐步总结为简明和信息的突出显示,如选定的子集。我们提出了一种名为Compinition View评分(CVS)模型的新颖的深度排名模型,其产生每个视频段的球面评分图,并确定哪个视图适用于在推理时通过滑动窗口内核突出显示。为了评估拟议的框架,我们在Pano2VID基准数据集(SU,Jayaraman和Grauman 2016)上执行实验,我们新收集的360°视频高亮来自YouTube和Vimeo的视频集。通过使用亚马逊机械土耳麦使用定量摘要指标和用户研究的评估,我们证明了我们的方法优于几种最先进的突出检测方法。我们还表明,我们的模型推断比AutoCam(Su,Jayaraman和Grauman 2016)更快16倍,这是360°视频的第一个摘要算法之一。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号