首页> 外文会议>International conference on image analysis and processing >Towards Video Captioning with Naming: A Novel Dataset and a Multi-modal Approach
【24h】

Towards Video Captioning with Naming: A Novel Dataset and a Multi-modal Approach

机译:使用命名的视频字幕:一种新颖的数据集和一种多模式方法

获取原文

摘要

Current approaches for movie description lack the ability to name characters with their proper names, and can only indicate people with a generic "someone" tag. In this paper we present two contributions towards the development of video description architectures with naming capabilities: firstly, we collect and release an extension of the popular Montreal Video Annotation Dataset in which the visual appearance of each character is linked both through time and to textual mentions in captions. We annotate, in a semi-automatic manner, a total of 53k face tracks and 29k textual mentions on 92 movies. Moreover, to underline and quantify the challenges of the task of generating captions with names, we present different multi-modal approaches to solve the problem on already generated captions.
机译:当前用于电影描述的方法缺乏用适当名称来命名人物的能力,并且只能指示具有通用“某人”标签的人。在本文中,我们对具有命名功能的视频描述体系结构的发展做出了两个贡献:首先,我们收集并发布了流行的蒙特利尔视频注释数据集的扩展,其中每个字符的视觉外观都通过时间和文字提及而链接在一起。在字幕中。我们以半自动方式注释了92部电影中的53k面部轨迹和29k文字提示。此外,为了强调和量化使用名称生成字幕的任务所面临的挑战,我们提出了不同的多模式方法来解决已生成字幕的问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号