首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions
【24h】

Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions

机译:变分语境:利用用于接地的视觉和文本上下文表达式

获取原文
获取原文并翻译 | 示例

摘要

We focus on grounding (i.e., localizing or linking) referring expressions in images, e.g., "largest elephant standing behind baby elephant". This is a general yet challenging vision-language task since it does not only require the localization of objects, but also the multimodal comprehension of context - visual attributes (e.g., "largest", "baby") and relationships (e.g., "behind") that help to distinguish the referent from other objects, especially those of the same category. Due to the exponential complexity involved in modeling the context associated with multiple image regions, existing work oversimplifies this task to pairwise region modeling by multiple instance learning. In this paper, we propose a variational Bayesian method, called Variational Context, to solve the problem of complex context modeling in referring expression grounding. Specifically, our framework exploits the reciprocal relation between the referent and context, i.e., either of them influences estimation of the posterior distribution of the other, and thereby the search space of context can be greatly reduced. In addition to reciprocity, our framework considers the semantic information of context, i.e., the referring expression can be reproduced based on the estimated context. We also extend the model to unsupervised setting where no annotation for the referent is available. Extensive experiments on various benchmarks show consistent improvement over state-of-the-art methods in both supervised and unsupervised settings.
机译:我们专注于接地(即,本地化或连接)图像中的表达式,例如“站在婴儿大象后面的最大大象”中。这是一般但是具有挑战性的愿景 - 语言任务,因为它不仅要求对象的本地化,而且不仅需要对象的本地化,还需要对上下文的多式化理解 - 视觉属性(例如,“最大”,“Baby”)和关系(例如,“ )有助于区分来自其他对象的参考,尤其是相同类别的指称。由于在与多个图像区域相关联的上下文中涉及的指数复杂性,现有工作将此任务应用于通过多实例学习进行成对区域建模。在本文中,我们提出了一种变分贝叶斯方法,称为变分环境,以解决参考表达接地中的复杂语境建模问题。具体地,我们的框架利用了参考和上下文之间的互殖关系,即它们中的任何一个影响对方的后部分布的估计,从而可以大大减少上下文的搜索空间。除了互动之外,我们的框架还考虑上下文的​​语义信息,即,可以基于估计的上下文再现参考表达式。我们还将模型扩展到无监督的环境,其中没有注释参考。关于各种基准的广泛实验显示了在监督和无监督的环境中对最先进的方法的一致性改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号