首页> 外文会议>OnTheMove International Federated Conference >Learning Structured Video Descriptions: Automated Video Knowledge Extraction for Video Understanding Tasks
【24h】

Learning Structured Video Descriptions: Automated Video Knowledge Extraction for Video Understanding Tasks

机译:学习结构化视频描述:用于视频理解任务的自动视频知识提取

获取原文

摘要

Vision to language problems, such as video annotation, or visual question answering, stand out from the perceptual video understanding tasks (e.g., classification) through their cognitive nature and their tight connection to the field of natural language processing. While most of the current solutions to vision-to-language problems are inspired from machine translation methods, aiming to directly map visual features to text, several recent results on image and video understanding have proven the importance of specifically and formally representing the semantic content of a visual scene, before reasoning over it and mapping it to natural language. This paper proposes a deep learning solution to the problem of generating structured descriptions for videos, and evaluates it on a dataset of formally annotated videos, which has been automatically generated as part of this work. The recorded results confirm the potential of the solution, indicating that it manages to describe the semantic content in a video scene with a similar accuracy to the one of state-of-the-art natural language captioning models.
机译:对语言问题的愿景,例如视频注释或视觉问题的回答,通过认知性质及其与自然语言处理领域的紧密连接,从感知视频理解任务(例如,分类)脱颖而出。虽然大多数目前的视觉语言问题的解决方案受到机器翻译方法的启发,旨在直接将视觉功能映射到文本,但是最近的几个图像和视频理解的结果已经证明了专门和正式代表语义含量的重要性一个视觉场景,在推理之前并将其映射到自然语言。本文提出了深入的学习解决方案,对视频的结构化描述产生了深入的学习解决方案,并在正式注释视频的数据集上评估它,该数据集已作为本工作的一部分自动生成。记录的结果证实了解决方案的潜力,表明它可以管理以与最先进的自然语言标题模型的准确性相似的视频场景中的语义内容。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号