
Iterative Visual Reasoning Beyond Convolutions




We present a novel framework for iterative visual reasoning. Our framework goes beyond current recognition systems that lack the capability to reason beyond stack of convolutions. The framework consists of two core modules: a local module that uses spatial memory [4] to store previous beliefs with parallel updates; and a global graph-reasoning module. Our graph module has three components: a) a knowledge graph where we represent classes as nodes and build edges to encode different types of semantic relationships between them; b) a region graph of the current image where regions in the image are nodes and spatial relationships between these regions are edges; c) an assignment graph that assigns regions to classes. Both the local module and the global module roll-out iteratively and cross-feed predictions to each other to refine estimates. The final predictions are made by combining the best of both modules with an attention mechanism. We show strong performance over plain ConvNets, e.g. achieving an 8.4% absolute improvement on ADE [55] measured by per-class average precision. Analysis also shows that the framework is resilient to missing regions for reasoning.
机译:我们提出了一种新颖的视觉迭代推理框架。我们的框架超越了当前的识别系统,后者缺乏超出卷积堆栈的推理能力。该框架由两个核心模块组成:一个使用空间内存[4]来存储先前信念并进行并行更新的本地模块;以及全局图推理模块。我们的图模块包含三个部分:a)知识图,其中我们将类表示为节点,并构建边缘以对它们之间的不同类型的语义关系进行编码; b)当前图像的区域图,其中图像中的区域是节点,这些区域之间的空间关系是边缘; c)将区域分配给类的分配图。本地模块和全局模块都以迭代方式推出,并且相互交叉预测以优化估计。最终的预测是通过将两个模块的优点与注意力机制相结合而做出的。与普通的ConvNets相比,我们显示出强大的性能通过按班级平均精度测得的ADE [55]绝对精度提高了8.4%。分析还表明,该框架可以对缺失的区域进行推理。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号