Going Deeper With Semantics: Video Activity Interpretation Using Semantic Contextualization

机译：深入理解语义：使用语义语境化的视频活动解释

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A deeper understanding of video activities extends beyond recognition of underlying concepts such as actions and objects: constructing deep semantic representations requires reasoning about the semantic relationships among these concepts, often beyond what is directly observed in the data. To this end, we propose an energy minimization framework that leverages large-scale commonsense knowledge bases, such as ConceptNet, to provide contextual cues to establish semantic relationships among entities directly hypothesized from video. We mathematically express this using the language of Grenander's canonical pattern generator theory. We show that the use of prior encoded commonsense knowledge alleviate the need for large annotated training datasets and help tackle imbalance in training through prior knowledge. Using three different publicly available datasets - Charades, Microsoft Visual Description Corpus and Breakfast Actions datasets, we show that the proposed model can generate video interpretations whose quality is better than those reported by state-of-the-art approaches, which have substantial training needs. Through extensive experiments, we show that the use of commonsense knowledge from ConceptNet allows the proposed approach to handling various challenges such as training data imbalance, weak features and complex semantic relationships and visual scenes.

机译：对视频活动的更深入理解不仅仅限于对诸如动作和对象之类的基本概念的认识：构建深层的语义表示需要对这些概念之间的语义关系进行推理，而这往往超出了在数据中直接观察到的含义。为此，我们提出了一个能量最小化框架，该框架利用了诸如ConceptNet之类的大规模常识知识库来提供上下文线索，以在直接从视频中假设的实体之间建立语义关系。我们使用格林纳德（Grenander）的规范模式生成器理论的语言在数学上表达这一点。我们表明，使用先前编码的常识知识可以减轻对大型带注释的训练数据集的需求，并有助于解决通过先验知识进行的训练中的失衡问题。使用三个不同的公共可用数据集-Charades，Microsoft视觉描述语料库和Breakfast Actions数据集，我们表明，所提出的模型可以生成质量优于最新方法（具有大量培训需求）的视频解释，其视频解释的质量更好。通过广泛的实验，我们表明，使用ConceptNet的常识可以使所提出的方法能够应对各种挑战，例如训练数据不平衡，功能弱和复杂的语义关系以及视觉场景。

著录项

来源
《IEEE Winter Conference on Applications of Computer Vision》|2019年|190-199|共10页
会议地点
作者
Sathyanarayanan Aakur; Fillipe DM de Souza; Sudeep Sarkar;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Semantics; Generators; Training; Visualization; Training data; Task analysis; Knowledge based systems;

机译：语义;生成器;培训;可视化;培训数据;任务分析;基于知识的系统;

相似文献

外文文献
中文文献
专利

1. Representations of semantic mappings: A step towards a dichotomy of application semantics and contextual semantics [J] . Yimin Zhu, Xiaodong Li International Journal of Project Management . 2007,第2期

机译：语义映射的表示：迈向应用程序语义和上下文语义二分法的一步
2. Understanding Video Events: A Survey of Methods for Automatic Interpretation of Semantic Occurrences in Video [J] . Lavee G., Rivlin E., Rudzsky M. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on . 2009,第5期

机译：了解视频事件：自动解释视频中出现的语义的方法的概述
3. Web video classification with visual and contextual semantics [J] . Afzal Mehtab, Shah Nadir, Muhammad Tufail International journal of communication systems . 2019,第13期

机译：具有视觉和上下文语义的网络视频分类
4. Going Deeper With Semantics: Video Activity Interpretation Using Semantic Contextualization [C] . Sathyanarayanan Aakur, Fillipe DM de Souza, Sudeep Sarkar IEEE Winter Conference on Applications of Computer Vision . 2019

机译：使用语义中的语义语境化进行更深入的语义：视频活动解释
5. Semantic Web for Everyone: Exploring Semantic Web Knowledge Bases via Contextual Tag Clouds and Linguistic Interpretations. [D] . Zhang, Xingjian. 2014

机译：适合所有人的语义网：通过上下文标记云和语言解释来探索语义网知识库。
6. Do semantic contextual cues facilitate transfer learning from video in toddlers? [O] . Laura Zimmermann, Alecia Moser, Amanda Grenell, -1

机译：语义上下文提示是否有助于从幼儿学到的视频进行迁移学习？
7. Scene-driven Retrieval in Edited Videos using Aesthetic and Semantic Deep Features [O] . Baraldi, Lorenzo, Grana, Costantino, Cucchiara, Rita 2016

机译：使用美学和语义深度功能在编辑视频中进行场景驱动的检索

Going Deeper With Semantics: Video Activity Interpretation Using Semantic Contextualization

摘要

著录项

相似文献

相关主题

期刊订阅