GENERATING OPEN WORLD DESCRIPTIONS OF VIDEO USING COMMON SENSE KNOWLEDGE IN A PATTERN THEORY FRAMEWORK

Aakur Sathyanarayanan N.; De Souza Fillipe D. M.; Sarkar Sudeep

首页> 外文期刊>Quarterly of Applied Mathematics >GENERATING OPEN WORLD DESCRIPTIONS OF VIDEO USING COMMON SENSE KNOWLEDGE IN A PATTERN THEORY FRAMEWORK

【24h】

GENERATING OPEN WORLD DESCRIPTIONS OF VIDEO USING COMMON SENSE KNOWLEDGE IN A PATTERN THEORY FRAMEWORK

机译：在模式理论框架中使用常识知识生成视频的开放世界描述

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The task of interpretation of activities as captured in video extends beyond just the recognition of observed actions and objects. It involves open world reasoning and constructing deep semantic connections that go beyond what is directly observed in the video and annotated in the training data. Prior knowledge plays a big role. Grenander's canonical pattern theory representation offers an elegant mechanism to capture these semantic connections between what is observed directly in the image and past knowledge in large-scale common sense knowledge bases, such as Concept-Net. We represent interpretations using a connected structure of basic detected (grounded) concepts, such as objects and actions, that are bound by semantics with other background concepts not directly observed, i.e., contextualization cues. Concepts are basic generators and the bonds are defined by the semantic relationships between concepts. Local and global regularity constraints govern these bonds and the overall connection structure. We use an inference engine based on energy minimization using an efficient Markov Chain Monte Carlo that uses the Concept-Net in its move proposals to find these structures that describe the image content. Using four different publicly available large data-sets, Charades, Microsoft Visual Description Corpus (MSVD), Breakfast Actions, and CMU Kitchen, we show that the proposed model can generate video interpretations whose quality is comparable or better than those reported by state-of-the-art approaches, such as different forms of deep learning models, graphical models, and context-free grammars. Apart from the increased performance, the use of encoded common sense knowledge sources alleviate the need for large annotated training data-sets and help tackle any imbalance in the data through prior knowledge, which is the bane of current machine learning approaches.

机译：在视频中捕获的活动的解释任务超出了对观察到的行动和对象的认可。它涉及开放的世界推理和构建深度语义连接，超出视频中直接观察到的内容并在培训数据中注释。先验知识发挥着重要作用。磨刀器的规范模式理论代表提供了一种优雅的机制，可以在大规模常识知识库（例如概念网）中直接在图像和过去知识中直接观察到的这些语义连接。我们代表使用基本检测到（接地）概念的连接结构的解释，例如对象和动作，其与未直接观察到的其他背景概念，即上下文化线索。概念是基本发生器，债券由概念之间的语义关系定义。本地和全球规则约束管理这些债券和整体连接结构。我们使用推理引擎基于能量最小化使用效率的Markov Chain Monte Carlo，它在其移动建议中使用概念网络来查找描述图像内容的结构。使用四种不同的公共数据集大数据集，Charades，Microsoft Visual Sumbers语料库（MSVD），早餐行动和CMU厨房，我们表明所提出的模型可以生成视频解释，其质量比通过状态报告的质量或更好 - 艺术方法，如不同形式的深度学习模型，图形模型和无背景语法。除了增加的性能之外，使用编码的常识知识来源的使用缓解了大量注释训练数据集的需求，并通过现有知识帮助解决数据中的任何不平衡，这是当前机器学习方法的损失。

著录项

来源
《Quarterly of Applied Mathematics》 |2019年第2期|共34页
作者
Aakur Sathyanarayanan N.; De Souza Fillipe D. M.; Sarkar Sudeep;
展开▼
作者单位

Univ S Florida Dept Comp Sci &

Engn Tampa FL 33620 USA;

Univ S Florida Dept Comp Sci &

Engn Tampa FL 33620 USA;

Univ S Florida Dept Comp Sci &

Engn Tampa FL 33620 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类数学;
关键词
Pattern theory; activity interpretation; video semantics; open world;

机译：模式理论;活动解释;视频语义;开放世界;

相似文献

外文文献
中文文献
专利

1. GENERATING OPEN WORLD DESCRIPTIONS OF VIDEO USING COMMON SENSE KNOWLEDGE IN A PATTERN THEORY FRAMEWORK [J] . Aakur Sathyanarayanan N., De Souza Fillipe D. M., Sarkar Sudeep Quarterly of Applied Mathematics . 2019,第2期

机译：在模式理论框架中使用常识知识生成视频的开放世界描述
2. Mobile-Cloud Assisted Video Summarization Framework for Efficient Management of Remote Sensing Data Generated by Wireless Capsule Sensors [J] . Irfan Mehmood, Muhammad Sajjad, Sung Wook Baik Sensors . 2014,第9期

机译：移动云辅助视频汇总框架，可有效管理无线胶囊传感器生成的遥感数据
3. A framework for automatic semantic video annotation Utilizing similarity and commonsense knowledge bases [J] . Amjad Altadmri, Amr Ahmed Multimedia Tools and Applications . 2014,第2期

机译：利用相似性和常识性知识库的自动语义视频注释框架
4. Video 2 Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning [C] . Zhiyuan Fang, Tejas Gokhale, Pratyay Banerjee, Conference on Empirical Methods in Natural Language Processing . 2020

机译：视频2致辞：生成致辞描述，以丰富录像
5. Knowledge-informed simulated annealing for generating prescribed spatial patterns in resource allocation. [D] . Duh, Jiunn-Der Geoffrey. 2004

机译：知识知性的模拟退火，用于在资源分配中生成规定的空间模式。
6. Mobile-Cloud Assisted Video Summarization Framework for Efficient Management of Remote Sensing Data Generated by Wireless Capsule Sensors [O] . Irfan Mehmood, Muhammad Sajjad, Sung Wook Baik, 2014

机译：移动云辅助视频汇总框架可有效管理无线胶囊传感器生成的遥感数据
7. Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning [O] . Zhiyuan Fang, Tejas Gokhale, Pratyay Banerjee, 2020

机译：Video2Commonsense：生成致辞描述，以丰富录像
8. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild. [R] . Thomason, J., Venugopalan, S., Guadarrama, S., 2014

机译：整合语言和视觉，生成自然语言对野外视频的描述。

GENERATING OPEN WORLD DESCRIPTIONS OF VIDEO USING COMMON SENSE KNOWLEDGE IN A PATTERN THEORY FRAMEWORK

摘要

著录项

相似文献

相关主题

期刊订阅