LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval

机译：洛根：潜在的图表共同关注网络，用于弱监督视频时刻检索

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The goal of weakly-supervised video moment retrieval is to localize the video segment most relevant to a description without access to temporal annotations during training. Prior work uses co-attention mechanisms to understand relationships between the vision and language data, but they lack contextual information between video frames that can be useful to determine how well a segment relates to the query. To address this, we propose an efficient Latent Graph Co-Attention Network (LoGAN) that exploits fine-grained frame-by-word interactions to jointly reason about the correspondences between all possible pairs of frames, providing context cues absent in prior work. Experiments on the DiDeMo and Charades-STA datasets demonstrate the effectiveness of our approach, where we improve Recall@1 by 520% over prior weakly-supervised methods, even boasting an 11% gain over strongly-supervised methods on DiDeMo, while also using significantly fewer model parameters than other co-attention mechanisms.

机译：弱监督视频时刻检索的目标是本地化与描述中最相关的视频段，而无需在训练期间访问时间注释。事先工作使用共同关注机制来了解视觉和语言数据之间的关系，但它们缺少视频帧之间的上下文信息，以确定段涉及查询的程度。为了解决这个问题，我们提出了一种有效的潜在级别的共同关注网络（Logan），该网络（Logan）利用细粒度的逐个单词交互，以共同原因是关于所有可能的帧对之间的对应关系，提供了在现有工作中缺席的上下文提示。 DIDEMO和Charades-Sta Datasets的实验证明了我们的方法的有效性，在现有的弱弱监督方法中，我们提高了520％的召回@ 1左右，甚至在迪特穆博上的强烈监督方法中增加了11％的增长模型参数较少，而不是其他共同关注机制。

著录项

来源
《IEEE Winter Conference on Applications of Computer Vision》|2021年|2082-2091|共10页
会议地点
作者
Reuben Tan; Huijuan Xu; Kate Saenko; Bryan A. Plummer;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Location awareness; Computer vision; Annotations; Conferences; Semantics; Natural languages;

机译：培训;位置意识;计算机愿景;注释;会议;语义;自然语言;

相似文献

外文文献
中文文献
专利

1. Graph-CAT: Graph Co-Attention Networks via local and global attribute augmentations [J] . Liang Yang, Weixun Li, Yuanfang Guo, Future generation computer systems . 2021,第May期

机译：Graph-Cat：通过本地和全局属性增强的图形共同关注网络
2. CoGCN: Combining co-attention with graph convolutional network for entity linking with knowledge graphs [J] . Jia Ningning, Cheng Xiang, Su Sen, Expert Systems . 2021,第1期

机译：COGCN：与图形卷积网络相结合的共同关注实体与知识图表链接
3. SLTFNet: A spatial and language-temporal tensor fusion network for video moment retrieval [J] . Jiang Bin, Huanga Xin, Yanga Chao, Information Processing & Management . 2019,第6期

机译：SLTFNet：用于视频矩检索的空间和语言-时间张量融合网络
4. Weakly-Supervised Moment Retrieval Network for Video Corpus Moment Retrieval [C] . Sunjae Yoon, Dahyun Kim, Ji Woo Hong, IEEE International Conference on Image Processing . 2021

机译：视频语料库时刻检索的弱监督时刻检索网络
5. Robust low-latency voice and video communication over best-effort networks. [D] . Liang, Yi. 2003

机译：尽力而为网络上的强大的低延迟语音和视频通信。
6. An Effective Dense Co-Attention Networks for Visual Question Answering [O] . Shirong He, Dezhi Han 2020

机译：用于视觉问题的有效密集的联合网络
7. VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval [O] . Minuk Ma, Sunjae Yoon, Junyeong Kim, 2020

机译：VLANET：视频语言对齐网络，用于弱监督视频时刻检索

LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval

摘要

著录项

相似文献

相关主题

期刊订阅