首页> 外文会议>IEEE Winter Conference on Applications of Computer Vision >Going Deeper With Semantics: Video Activity Interpretation Using Semantic Contextualization
【24h】

Going Deeper With Semantics: Video Activity Interpretation Using Semantic Contextualization

机译:深入理解语义:使用语义语境化的视频活动解释

获取原文

摘要

A deeper understanding of video activities extends beyond recognition of underlying concepts such as actions and objects: constructing deep semantic representations requires reasoning about the semantic relationships among these concepts, often beyond what is directly observed in the data. To this end, we propose an energy minimization framework that leverages large-scale commonsense knowledge bases, such as ConceptNet, to provide contextual cues to establish semantic relationships among entities directly hypothesized from video. We mathematically express this using the language of Grenander's canonical pattern generator theory. We show that the use of prior encoded commonsense knowledge alleviate the need for large annotated training datasets and help tackle imbalance in training through prior knowledge. Using three different publicly available datasets - Charades, Microsoft Visual Description Corpus and Breakfast Actions datasets, we show that the proposed model can generate video interpretations whose quality is better than those reported by state-of-the-art approaches, which have substantial training needs. Through extensive experiments, we show that the use of commonsense knowledge from ConceptNet allows the proposed approach to handling various challenges such as training data imbalance, weak features and complex semantic relationships and visual scenes.
机译:对视频活动的更深入理解不仅仅限于对诸如动作和对象之类的基本概念的认识:构建深层的语义表示需要对这些概念之间的语义关系进行推理,而这往往超出了在数据中直接观察到的含义。为此,我们提出了一个能量最小化框架,该框架利用了诸如ConceptNet之类的大规模常识知识库来提供上下文线索,以在直接从视频中假设的实体之间建立语义关系。我们使用格林纳德(Grenander)的规范模式生成器理论的语言在数学上表达这一点。我们表明,使用先前编码的常识知识可以减轻对大型带注释的训练数据集的需求,并有助于解决通过先验知识进行的训练中的失衡问题。使用三个不同的公共可用数据集-Charades,Microsoft视觉描述语料库和Breakfast Actions数据集,我们表明,所提出的模型可以生成质量优于最新方法(具有大量培训需求)的视频解释,其视频解释的质量更好。通过广泛的实验,我们表明,使用ConceptNet的常识可以使所提出的方法能够应对各种挑战,例如训练数据不平衡,功能弱和复杂的语义关系以及视觉场景。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号