首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >Grounding Referring Expressions in Images by Variational Context
【24h】

Grounding Referring Expressions in Images by Variational Context

机译:通过变体上下文使图像中的指称表达接地

获取原文

摘要

We focus on grounding (i.e., localizing or linking) referring expressions in images, e.g., 'largest elephant standing behind baby elephant'. This is a general yet challenging vision-language task since it does not only require the localization of objects, but also the multimodal comprehension of context - visual attributes (e.g., 'largest', 'baby') and relationships (e.g., 'behind') that help to distinguish the referent from other objects, especially those of the same category. Due to the exponential complexity involved in modeling the context associated with multiple image regions, existing work oversimplifies this task to pairwise region modeling by multiple instance learning. In this paper, we propose a variational Bayesian method, called Variational Context, to solve the problem of complex context modeling in referring expression grounding. Our model exploits the reciprocal relation between the referent and context, i.e., either of them influences estimation of the posterior distribution of the other, and thereby the search space of context can be greatly reduced. We also extend the model to unsupervised setting where no annotation for the referent is available. Extensive experiments on various benchmarks show consistent improvement over state-of-the-art methods in both supervised and unsupervised settings. The code is available at https://github.com/yuleiniu/vc/.
机译:我们专注于将图片中的表情表达(例如“站在小象后面的最大大象”)扎根(即本地化或链接)。这是一项通用但具有挑战性的视觉语言任务,因为它不仅需要对象的定位,而且还需要对上下文的多模态理解-视觉属性(例如“最大”,“婴儿”)和关系(例如“背后”) ),有助于将参照物与其他对象(尤其是同一类别的对象)区分开。由于对与多个图像区域关联的上下文进行建模所涉及的指数复杂性,现有工作将这项任务过度简化为通过多实例学习进行成对区域建模。在本文中,我们提出了一种变体贝叶斯方法,称为变体上下文,以解决在引用表达式基础时复杂上下文建模的问题。我们的模型利用了指称对象与上下文之间的倒数关系,即,它们中的任意一个都会影响另一个对象的后验分布估计,从而可以大大减少上下文的搜索空间。我们还将模型扩展到无监督设置,在该设置下没有引用的注释可用。在各种基准下进行的大量实验表明,无论是在有监督的环境还是无监督的环境下,其最新技术都得到了持续改进。该代码可从https://github.com/yuleiniu/vc/获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号