Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video

机译：弱监督时空的自然句子在视频中

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we address a novel task, namely weakly-supervised spatio-temporally grounding natural sentence in video. Specifically, given a natural sentence and a video, we localize a spatio-temporal tube in the video that se-manttcally corresponds to the given sentence, with no reliance on any spatio-temporal annotations during training. First, a set of spatio-temporal tubes, referred to as instances, are extracted from the video. We then encode these instances and the sentence using our proposed attentive interactor which can exploit their fine-grained relationships to characterize their matching behaviors. Besides a ranking loss, a novel diversity loss is introduced to train the proposed attentive interactor to strengthen the matching behaviors of reliable instance-sentence pairs and penalize the unreliable ones. Moreover, we also contribute a dataset, called VID-sentence, based on the Im-ageNet video object detection dataset, to serve as a benchmark for our task. Extensive experimental results demonstrate the superiority of our model over the baseline approaches.

机译：在本文中，我们解决了一项新的任务，即视频中的弱监督时空临时自然句子。具体来说，给定自然句子和视频，我们在视频中定位时空管，该时空管与给定句子对应，并且在训练过程中不依赖任何时空注释。首先，从视频中提取一组称为实例的时空管道。然后，我们使用我们提出的殷勤交互器对这些实例和句子进行编码，该交互器可以利用它们的细粒度关系来表征其匹配行为。除了排名损失之外，还引入了一种新颖的分集损失来训练拟议的注意力交互器，以增强可靠的实例-句子对的匹配行为，并对不可靠的实例-句子对进行惩罚。此外，我们还基于Im-ageNet视频对象检测数据集提供了一个称为VID句子的数据集，以作为我们任务的基准。大量的实验结果证明了我们的模型优于基线方法。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2019年|1884-1894|共11页
会议地点
作者
Zhenfang Chen; Lin Ma; Wenhan Luo; Kwan-Yee K. Wong;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. vtGraphNet: Learning weakly-supervised scene graph for complex visual grounding [J] . Lyu Fan, Feng Wei, Wang Song Neurocomputing . 2020,第Nova6期

机译：Vtgraphnet：学习虚线监督的场景图，以实现复杂的视觉接地
2. Weakly-Supervised Visual Instrument-Playing Action Detection in Videos [J] . Liu Jen-Yu, Yang Yi-Hsuan, Jeng Shyh-Kang IEEE transactions on multimedia . 2019,第4期

机译：视频中弱监督的视觉乐器演奏动作检测
3. Weakly-Supervised Visual Instrument-Playing Action Detection in Videos [J] . Liu Jen-Yu, Yang Yi-Hsuan, Jeng Shyh-Kang IEEE transactions on multimedia . 2019,第4期

机译：视频中的虚弱性视觉仪器播放动作检测
4. Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video [C] . Zhenfang Chen, Lin Ma, Wenhan Luo, Annual meeting of the Association for Computational Linguistics . 2019

机译：弱监督的时空 - 在视频中跨越自然句子
5. A contemporary integration and defense of Thomas Aquinas's natural law theory: The grounding of natural law epistemology, metaphysics and theology. [D] . Endara, Miguel Angel. 2002

机译：托马斯·阿奎那自然法理论的当代整合与辩护：自然法认识论，形而上学和神学的基础。
6. Intention-Related Natural Language Grounding via Object Affordance Detection and Intention Semantic Extraction [O] . Jinpeng Mi, Hongzhuo Liang, Nikolaos Katsakis, 2020

机译：通过对象负担检测和意图语义提取实现与意图相关的自然语言基础
7. Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video [O] . Zhenfang Chen, Lin Ma, Wenhan Luo, 2019

机译：弱监督的时空 - 在视频中跨越自然句子

Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video

摘要

著录项

相似文献

相关主题

期刊订阅