Learning Structured Video Descriptions: Automated Video Knowledge Extraction for Video Understanding Tasks

机译：学习结构化视频描述：用于视频理解任务的自动视频知识提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Vision to language problems, such as video annotation, or visual question answering, stand out from the perceptual video understanding tasks (e.g., classification) through their cognitive nature and their tight connection to the field of natural language processing. While most of the current solutions to vision-to-language problems are inspired from machine translation methods, aiming to directly map visual features to text, several recent results on image and video understanding have proven the importance of specifically and formally representing the semantic content of a visual scene, before reasoning over it and mapping it to natural language. This paper proposes a deep learning solution to the problem of generating structured descriptions for videos, and evaluates it on a dataset of formally annotated videos, which has been automatically generated as part of this work. The recorded results confirm the potential of the solution, indicating that it manages to describe the semantic content in a video scene with a similar accuracy to the one of state-of-the-art natural language captioning models.

机译：通过诸如视频注释或视觉问题解答之类的语言问题的视觉，通过其认知性质以及与自然语言处理领域的紧密联系，从可感知的视频理解任务（例如，分类）中脱颖而出。虽然目前大多数视觉到语言问题的解决方案都是从机器翻译方法中汲取灵感的，目的是将视觉特征直接映射到文本，但是最近在图像和视频理解上的一些结果证明了专门和正式地表示语言语义内容的重要性。视觉场景，然后对其进行推理并将其映射为自然语言。本文针对生成视频的结构化描述的问题提出了一种深度学习解决方案，并在正式注释的视频的数据集上对其进行了评估，该数据集已作为此项工作的一部分而自动生成。记录的结果证实了该解决方案的潜力，表明该解决方案设法以与最新的自然语言字幕模型类似的准确性来描述视频场景中的语义内容。

著录项

来源
《International conference on the move to meaningful internet systems;Conference on cooperative information systems;Conference on cloud and trusted computing;Conference on ontologies, databases, and applications of semantics》|2018年|315-332|共18页
会议地点
作者
Daniel Vasile; Thomas Lukasiewicz;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Structured video captioning; Video understanding;

机译：结构化视频字幕;视频理解;

相似文献

外文文献
中文文献
专利

1. Video structured description technology based intelligence analysis of surveillance videos for public security applications [J] . Xu Zheng, Hu Chuanping, Mei Lin Multimedia Tools and Applications . 2016,第19期

机译：基于视频结构化描述技术的公共安全监控视频情报分析
2. Learning to use technological support for self-organised EFL learning: Web-based tasks and YouTube relayed instructional videos [J] . Jason R. Byrne The JALT CALL Journal . 2010,第1期

机译：学习使用技术支持进行自组织的EFL学习：基于Web的任务和YouTube中继教学视频
3. Videolization: knowledge graph based automated video generation from web content [J] . Kalender Murat, Eren M. Tolga, Wu Zonghuan, Multimedia Tools and Applications . 2018,第1期

机译：视频化：基于知识图的Web内容自动生成视频
4. Learning Structured Video Descriptions: Automated Video Knowledge Extraction for Video Understanding Tasks [C] . Daniel Vasile, Thomas Lukasiewicz OnTheMove International Federated Conference . 2018

机译：学习结构化视频描述：用于视频理解任务的自动视频知识提取
5. Knowledge extraction in video through the interaction analysis of activities knowledge extraction in video through the interaction analysis of activities [D] . Florez, Omar U. 2013

机译：通过活动的交互分析提取视频中的知识通过活动的交互分析提取视频中的知识
6. Automated Pain Detection in Facial Videos of Children using Human-Assisted Transfer Learning [O] . Xiaojing Xu, Kenneth D. Craig, Damaris Diaz, -1

机译：使用人类辅助转移学习技术自动检测儿童面部视频中的疼痛
7. Learning Structured Video Descriptions: Automated Video Knowledge Extraction for Video Understanding Tasks [O] . Daniel Vasile, Thomas Lukasiewicz 2018

机译：学习结构化视频描述：用于视频理解任务的自动视频知识提取

Learning Structured Video Descriptions: Automated Video Knowledge Extraction for Video Understanding Tasks

摘要

著录项

相似文献

相关主题

期刊订阅