首页> 外文会议>IEEE Winter Conference on Applications of Computer Vision >LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval
【24h】

LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval

机译:洛根:潜在的图表共同关注网络,用于弱监督视频时刻检索

获取原文

摘要

The goal of weakly-supervised video moment retrieval is to localize the video segment most relevant to a description without access to temporal annotations during training. Prior work uses co-attention mechanisms to understand relationships between the vision and language data, but they lack contextual information between video frames that can be useful to determine how well a segment relates to the query. To address this, we propose an efficient Latent Graph Co-Attention Network (LoGAN) that exploits fine-grained frame-by-word interactions to jointly reason about the correspondences between all possible pairs of frames, providing context cues absent in prior work. Experiments on the DiDeMo and Charades-STA datasets demonstrate the effectiveness of our approach, where we improve Recall@1 by 520% over prior weakly-supervised methods, even boasting an 11% gain over strongly-supervised methods on DiDeMo, while also using significantly fewer model parameters than other co-attention mechanisms.
机译:弱监督视频时刻检索的目标是本地化与描述中最相关的视频段,而无需在训练期间访问时间注释。事先工作使用共同关注机制来了解视觉和语言数据之间的关系,但它们缺少视频帧之间的上下文信息,以确定段涉及查询的程度。为了解决这个问题,我们提出了一种有效的潜在级别的共同关注网络(Logan),该网络(Logan)利用细粒度的逐个单词交互,以共同原因是关于所有可能的帧对之间的对应关系,提供了在现有工作中缺席的上下文提示。 DIDEMO和Charades-Sta Datasets的实验证明了我们的方法的有效性,在现有的弱弱监督方法中,我们提高了520%的召回@ 1左右,甚至在迪特穆博上的强烈监督方法中增加了11%的增长模型参数较少,而不是其他共同关注机制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号