...
首页> 外文期刊>Quarterly of Applied Mathematics >GENERATING OPEN WORLD DESCRIPTIONS OF VIDEO USING COMMON SENSE KNOWLEDGE IN A PATTERN THEORY FRAMEWORK
【24h】

GENERATING OPEN WORLD DESCRIPTIONS OF VIDEO USING COMMON SENSE KNOWLEDGE IN A PATTERN THEORY FRAMEWORK

机译:在模式理论框架中使用常识知识生成视频的开放世界描述

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The task of interpretation of activities as captured in video extends beyond just the recognition of observed actions and objects. It involves open world reasoning and constructing deep semantic connections that go beyond what is directly observed in the video and annotated in the training data. Prior knowledge plays a big role. Grenander's canonical pattern theory representation offers an elegant mechanism to capture these semantic connections between what is observed directly in the image and past knowledge in large-scale common sense knowledge bases, such as Concept-Net. We represent interpretations using a connected structure of basic detected (grounded) concepts, such as objects and actions, that are bound by semantics with other background concepts not directly observed, i.e., contextualization cues. Concepts are basic generators and the bonds are defined by the semantic relationships between concepts. Local and global regularity constraints govern these bonds and the overall connection structure. We use an inference engine based on energy minimization using an efficient Markov Chain Monte Carlo that uses the Concept-Net in its move proposals to find these structures that describe the image content. Using four different publicly available large data-sets, Charades, Microsoft Visual Description Corpus (MSVD), Breakfast Actions, and CMU Kitchen, we show that the proposed model can generate video interpretations whose quality is comparable or better than those reported by state-of-the-art approaches, such as different forms of deep learning models, graphical models, and context-free grammars. Apart from the increased performance, the use of encoded common sense knowledge sources alleviate the need for large annotated training data-sets and help tackle any imbalance in the data through prior knowledge, which is the bane of current machine learning approaches.
机译:在视频中捕获的活动的解释任务超出了对观察到的行动和对象的认可。它涉及开放的世界推理和构建深度语义连接,超出视频中直接观察到的内容并在培训数据中注释。先验知识发挥着重要作用。磨刀器的规范模式理论代表提供了一种优雅的机制,可以在大规模常识知识库(例如概念网)中直接在图像和过去知识中直接观察到的这些语义连接。我们代表使用基本检测到(接地)概念的连接结构的解释,例如对象和动作,其与未直接观察到的其他背景概念,即上下文化线索。概念是基本发生器,债券由概念之间的语义关系定义。本地和全球规则约束管理这些债券和整体连接结构。我们使用推理引擎基于能量最小化使用效率的Markov Chain Monte Carlo,它在其移动建议中使用概念网络来查找描述图像内容的结构。使用四种不同的公共数据集大数据集,Charades,Microsoft Visual Sumbers语料库(MSVD),早餐行动和CMU厨房,我们表明所提出的模型可以生成视频解释,其质量比通过状态报告的质量或更好 - 艺术方法,如不同形式的深度学习模型,图形模型和无背景语法。除了增加的性能之外,使用编码的常识知识来源的使用缓解了大量注释训练数据集的需求,并通过现有知识帮助解决数据中的任何不平衡,这是当前机器学习方法的损失。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号