首页> 外文期刊>IEEE Transactions on Image Processing >Learning Aligned Image-Text Representations Using Graph Attentive Relational Network
【24h】

Learning Aligned Image-Text Representations Using Graph Attentive Relational Network

机译:使用图形细心关系网络学习对齐的图像文本表示

获取原文
获取原文并翻译 | 示例

摘要

Image-text matching aims to measure the similarities between images and textual descriptions, which has made great progress recently. The key to this cross-modal matching task is to build the latent semantic alignment between visual objects and words. Due to the widespread variations of sentence structures, it is very difficult to learn the latent semantic alignment using only global cross-modal features. Many previous methods attempt to learn the aligned image-text representations by the attention mechanism but generally ignore the relationships within textual descriptions which determine whether the words belong to the same visual object. In this paper, we propose a graph attentive relational network (GARN) to learn the aligned image-text representations by modeling the relationships between noun phrases in a text for the identity-aware image-text matching. In the GARN, we first decompose images and texts into regions and noun phrases, respectively. Then a skip graph neural network (skip-GNN) is proposed to learn effective textual representations which are a mixture of textual features and relational features. Finally, a graph attention network is further proposed to obtain the probabilities that the noun phrases belong to the image regions by modeling the relationships between noun phrases. We perform extensive experiments on the CUHK Person Description dataset (CUHK-PEDES), Caltech-UCSD Birds dataset (CUB), Oxford-102 Flowers dataset and Flickr30K dataset to verify the effectiveness of each component in our model. Experimental results show that our approach achieves the state-of-the-art results on these four benchmark datasets.
机译:图像文本匹配旨在衡量图像与文本描述之间的相似性,最近取得了很大的进步。此跨模型匹配任务的关键是在视觉对象和单词之间构建潜在语义对齐。由于句子结构的广泛变化,仅使用全局跨模型特征来学习潜在语义对齐。许多以前的方法尝试通过注意机制学习对齐的图像文本表示,但通常忽略文本描述中的关系,该文本描述中确定单词是否属于同一视觉对象。在本文中,我们提出了一个图形细节关系网络(Garn)来学习对齐的图像文本表示,通过在文本中为标识感知图像文本匹配的文本中的名词短语之间建模关系来学习对齐的图像文本表示。在Garn中,我们首先分别将图像和文本分解为区域和名词短语。然后,提出了跳过图形(Skip-Gnn)以学习有效的文本表示,这是文本特征和关系特征的混合。最后,进一步提出了一种图注意网络来获得通过建模名词短语之间的关系来获得名词短语属于图像区域的概率。我们在CUHK人员描述数据集(CUHK-PECE),CALTECH-UCSD鸟类数据集(CUB),牛津-102花数据集和FLICKR30K数据集进行了大量实验,以验证我们模型中每个组件的有效性。实验结果表明,我们的方法在这四个基准数据集上实现了最先进的结果。

著录项

  • 来源
    《IEEE Transactions on Image Processing》 |2021年第1期|1840-1852|共13页
  • 作者单位

    National Laboratory of Pattern Recognition Center for Research on Intelligent Perception and Computing Institute of Automation Chinese Academy of Sciences (CASIA) Beijing China;

    National Laboratory of Pattern Recognition Center for Research on Intelligent Perception and Computing Institute of Automation Chinese Academy of Sciences (CASIA) Beijing China;

    National Laboratory of Pattern Recognition Center for Research on Intelligent Perception and Computing Institute of Automation Chinese Academy of Sciences (CASIA) Beijing China;

    National Laboratory of Pattern Recognition Center for Research on Intelligent Perception and Computing Institute of Automation Chinese Academy of Sciences (CASIA) Beijing China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Graph neural networks; Visualization; Semantics; Task analysis; Feature extraction; Annotations; Recurrent neural networks;

    机译:图形神经网络;可视化;语义;任务分析;特征提取;注释;经常性神经网络;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号