【24h】

Learning Structured Video Descriptions: Automated Video Knowledge Extraction for Video Understanding Tasks

机译:学习结构化视频描述:用于视频理解任务的自动视频知识提取

获取原文

摘要

Vision to language problems, such as video annotation, or visual question answering, stand out from the perceptual video understanding tasks (e.g., classification) through their cognitive nature and their tight connection to the field of natural language processing. While most of the current solutions to vision-to-language problems are inspired from machine translation methods, aiming to directly map visual features to text, several recent results on image and video understanding have proven the importance of specifically and formally representing the semantic content of a visual scene, before reasoning over it and mapping it to natural language. This paper proposes a deep learning solution to the problem of generating structured descriptions for videos, and evaluates it on a dataset of formally annotated videos, which has been automatically generated as part of this work. The recorded results confirm the potential of the solution, indicating that it manages to describe the semantic content in a video scene with a similar accuracy to the one of state-of-the-art natural language captioning models.
机译:通过诸如视频注释或视觉问题解答之类的语言问题的视觉,通过其认知性质以及与自然语言处理领域的紧密联系,从可感知的视频理解任务(例如,分类)中脱颖而出。虽然目前大多数视觉到语言问题的解决方案都是从机器翻译方法中汲取灵感的,目的是将视觉特征直接映射到文本,但是最近在图像和视频理解上的一些结果证明了专门和正式地表示语言语义内容的重要性。视觉场景,然后对其进行推理并将其映射为自然语言。本文针对生成视频的结构化描述的问题提出了一种深度学习解决方案,并在正式注释的视频的数据集上对其进行了评估,该数据集已作为此项工作的一部分而自动生成。记录的结果证实了该解决方案的潜力,表明该解决方案设法以与最新的自然语言字幕模型类似的准确性来描述视频场景中的语义内容。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号