首页> 外文期刊>Expert systems with applications >Learning user interest with improved triplet deep ranking and web-image priors for topic-related video summarization
【24h】

Learning user interest with improved triplet deep ranking and web-image priors for topic-related video summarization

机译:利用改进的三联体积深度排名和网页上的网页与主题相关的视频摘要学习用户兴趣

获取原文
获取原文并翻译 | 示例

摘要

Video summarization facilitates rapid browsing and efficient video indexing in many video browsing website applications, such as sport video highlights, dynamic video cover. In these applications, it is most important to generate user video summaries that capture interesting video content that users prefer. While many existing methods generate video summaries based on low-level features, this paper first proposes to mine large-scale Flickr images and find "interest" and "non-interest" images from Flickr for the same query to learn what is of interest to users. Unlike existing pairwise ranking-based methods for video summarization, we then propose an improved triplet deep ranking model that is easier to converge to learn the relationship between "interest" and "non-interest" Flickr images, and exploit what visual content of the original video is indeed preferred by users. In the training process, triplets (interest image p+, interest image p '+, non-interest image p '') are selected as input to train a model with three parallel deep convolutional networks. In the video summarization process, an efficient entropy-based video segmentation method is proposed for dividing the original video into segments and the visual interest scores of the segments are estimated using the trained ranking network for summarization (SumNet). Then, an optimal subset of the segments is selected to create a summary capturing interesting visual content. We evaluate and compare our method with several state-of-the-art methods, experimental results show that our method achieves an improvement over the best baseline method by 9.6% in terms of mean Average Precision (mAP) accuracy.
机译:视频摘要促进了许多视频浏览网站应用中的快速浏览和高效视频索引,例如运动视频亮点,动态视频盖。在这些应用程序中,最重要的是生成用户捕获用户更喜欢的有趣视频内容的用户视频摘要。虽然许多现有方法基于低级别功能生成视频摘要,但本文首先提出了从Flickr查找大规模的Flickr图像并找到“兴趣”和“非兴趣”图像,从Flickr获取相同的查询,以了解什么感兴趣用户。与现有的基于比赛排名的视频摘要方法不同,我们提出了一种改进的三联体积深度排名模型,更容易收敛,以了解“兴趣”和“非兴趣”Flickr图像之间的关系,并利用原始的视觉内容用户确实优先于用户。在训练过程中,选择三联(兴趣图像P +,兴趣图像P'+,非兴趣图像P'')作为输入以训练具有三个平行深度卷积网络的模型。在视频摘要过程中,提出了一种基于有效的基于熵的视频分割方法,用于将原始视频划分为片段,并且使用训练的排名网络估计段的段的视觉兴趣分数以进行摘要(Sumnet)。然后,选择段的最佳子集以创建捕获有趣的视觉内容的摘要。我们评估并比较我们用多种最先进的方法,实验结果表明我们的方法在平均平均精度(MAP)精度方面,我们的方法通过9.6%实现了9.6%的提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号