...
首页> 外文期刊>Computational Intelligence >A hierarchical recurrent approach to predict scene graphs from a visual-attention-oriented perspective
【24h】

A hierarchical recurrent approach to predict scene graphs from a visual-attention-oriented perspective

机译:从面向视觉注意力的角度预测场景图的分层递归方法

获取原文
获取原文并翻译 | 示例
           

摘要

A scene graph provides a powerful intermediate knowledge structure for various visual tasks, including semantic image retrieval, image captioning, and visual question answering. In this paper, the task of predicting a scene graph for an image is formulated as two connected problems, ie, recognizing the relationship triplets, structured as < subject-predicate-object >, and constructing the scene graph from the recognized relationship triplets. For relationship triplet recognition, we develop a novel hierarchical recurrent neural network with visual attention mechanism. This model is composed of two attention-based recurrent neural networks in a hierarchical organization. The first network generates a topic vector for each relationship triplet, whereas the second network predicts each word in that relationship triplet given the topic vector. This approach successfully captures the compositional structure and contextual dependency of an image and the relationship triplets describing its scene. For scene graph construction, an entity localization approach to determine the graph structure is presented with the assistance of available attention information. Then, the procedures for automatically converting the generated relationship triplets into a scene graph are clarified through an algorithm. Extensive experimental results on two widely used data sets verify the feasibility of the proposed approach.
机译:场景图为各种视觉任务提供了强大的中间知识结构,包括语义图像检索,图像字幕和视觉问题解答。在本文中,预测图像的场景图的任务被表述为两个相互联系的问题,即识别关系三元组,构造为<主语-谓语-对象>,并从识别的关系三元组构造场景图。对于关系三重态识别,我们开发了一种具有视觉注意力机制的新型分层递归神经网络。该模型由一个分层组织中的两个基于注意力的循环神经网络组成。第一个网络为每个关系三元组生成一个主题向量,而第二个网络在给定主题向量的情况下预测该关系三元组中的每个单词。这种方法成功地捕获了图像的构图结构和上下文依存关系以及描述其场景的三重关系。对于场景图构建,在可用的关注信息的帮助下,提出了一种用于确定图结构的实体定位方法。然后,通过算法阐明了将生成的关系三元组自动转换为场景图的过程。在两个广泛使用的数据集上的大量实验结果证明了该方法的可行性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号