首页> 外文期刊>Pattern recognition letters >Learning cross-modal correlations by exploring inter-word semantics and stacked co-attention
【24h】

Learning cross-modal correlations by exploring inter-word semantics and stacked co-attention

机译:通过探索词间语义和堆叠式共同注意来学习跨模式关联

获取原文
获取原文并翻译 | 示例

摘要

Cross-modal information retrieval aims to find heterogeneous data of various modalities from a given query of one modality. The main challenge is to learn the semantic correlations between different modalities and measure the distance across modalities. For text-image retrieval, existing work mostly uses off-the-shelf Convolutional Neural Network (CNN) for image feature extraction. For texts, word-level features such as bag-of-words or word2vec are employed to build deep learning models to represent texts. Besides word-level semantics, the semantic relations between words are also informative but less explored. In this paper, we explore the inter-word semantics by modelling texts by graphs using similarity measure based on word2vec. Besides feature presentations, we further study the problem of information imbalance between different modalities when describing the same semantics. For example textual descriptions often contain more background information that cannot be conveyed by images and vice versa. We propose a stacked co-attention network to progressively learn the mutually attended features of different modalities and enhance their fine-grained correlations. A dual-path neural network is proposed for cross-modal information retrieval. The model is trained by a pairwise similarity loss function to maximize the similarity of relevant text-image pairs and minimize the similarity of irrelevant pairs. Experimental results show that the proposed model outperforms the state-of-the-art methods significantly, with 19% improvement on accuracy for the best case. (C) 2018 Elsevier B.V. All rights reserved.
机译:跨模式信息检索旨在从一种模式的给定查询中找到各种模式的异构数据。主要的挑战是学习不同模态之间的语义相关性,并测量跨模态的距离。对于文本图像检索,现有工作大多使用现成的卷积神经网络(CNN)进行图像特征提取。对于文本,采用词级功能(例如词袋或word2vec)来构建表示文本的深度学习模型。除了单词级语义外,单词之间的语义关系也具有信息性,但很少探讨。在本文中,我们使用基于word2vec的相似度度量方法通过图形化文本建模来探索词间语义。除了功能介绍外,我们还将研究描述相同语义时不同模态之间的信息不平衡问题。例如,文本描述通常包含更多的背景信息,图像无法传达这些信息,反之亦然。我们提出了一个堆叠式共同注意网络,以逐步学习不同模态的相互参与的特征并增强其细粒度的相关性。提出了一种用于交叉模式信息检索的双路径神经网络。通过成对相似度损失函数训练模型,以最大化相关文本图像对的相似度,并最小化不相关对的相似度。实验结果表明,所提出的模型明显优于最新方法,在最佳情况下,精度提高了19%。 (C)2018 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Pattern recognition letters》 |2020年第2期|189-198|共10页
  • 作者

  • 作者单位

    Chinese Acad Sci Inst Informat Engn Beijing Peoples R China|Univ Chinese Acad Sci Sch Cyber Secur Beijing Peoples R China;

    Hangzhou Dianzi Univ Sch Comp Sci & Technol Hangzhou Peoples R China;

    Beihang Univ Sch ASEE Intelligent Comp & Machine Learning Lab Beijing Peoples R China;

    Chinese Acad Sci Inst Informat Engn Beijing Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Cross-modal correlation; Inter-word semantics; Fine-grained correlation; Stacked co-attention; Cross-modal retrieval;

    机译:跨模态相关;词间语义;细粒度相关;堆叠共同注意;跨模式检索;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号