首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Image and Sentence Matching via Semantic Concepts and Order Learning
【24h】

Image and Sentence Matching via Semantic Concepts and Order Learning

机译:通过语义概念和顺序学习进行图像和句子匹配

获取原文
获取原文并翻译 | 示例
           

摘要

Image and sentence matching has made great progress recently, but it remains challenging due to the existing large visual-semantic discrepancy. This mainly arises from two aspects: 1) images consist of unstructured content which is not semantically abstract as the words in the sentences, so they are not directly comparable, and 2) arranging semantic concepts in different semantic order could lead to quite diverse meanings. The words in the sentences are sequentially arranged in a grammatical manner, while the semantic concepts in the images are usually unorganized. In this work, we propose a semantic concepts and order learning framework for image and sentence matching, which can improve the image representation by first predicting semantic concepts and then organizing them in a correct semantic order. Given an image, we first use a multi-regional multi-label CNN to predict its included semantic concepts in terms of object, property and action. These word-level semantic concepts are directly comparable with the words of noun, adjective and verb in the matched sentence. Then, to organize these concepts and make them express similar meanings as the matched sentence, we use a context-modulated attentional LSTM to learn the semantic order. It regards the predicted semantic concepts and image global scene as context at each timestep, and selectively attends to concept-related image regions by referring to the context in a sequential order. To further enhance the semantic order, we perform additional sentence generation on the image representation, by using the groundtruth order in the matched sentence as supervision. After obtaining the improved image representation, we learn the sentence representation with a conventional LSTM, and then jointly perform image and sentence matching and sentence generation for model learning. Extensive experiments demonstrate the effectiveness of our learned semantic concepts and order, by achieving the state-of-the-art results on two public benchmark datasets.
机译:图像和句子匹配最近取得了长足的进步,但是由于存在巨大的视觉语义差异,因此仍然具有挑战性。这主要来自两个方面:1)图像由非结构化内容组成,这些内容在语义上不像句子中的单词那样抽象,因此它们不能直接比较; 2)以不同语义顺序排列语义概念可能会导致含义相当多样化。句子中的单词以语法方式顺序排列,而图像中的语义概念通常是没有组织的。在这项工作中,我们提出了一种用于图像和句子匹配的语义概念和顺序学习框架,该框架可以通过先预测语义概念然后以正确的语义顺序组织它们来改善图像表示。给定图像,我们首先使用多区域多标签CNN从对象,属性和动作方面预测其包含的语义概念。这些词级语义概念可直接与匹配句子中的名词,形容词和动词词相提并论。然后,为了组织这些概念并使它们表达与匹配句子相似的含义,我们使用上下文调制的注意力LSTM来学习语义顺序。它在每个时间步均将预测的语义概念和图像全局场景视为上下文,并通过按顺序引用上下文来有选择地关注与概念相关的图像区域。为了进一步增强语义顺序,我们使用匹配句子中的地面顺序作为监督对图像表示执行额外的句子生成。在获得改进的图像表示之后,我们使用常规的LSTM学习句子表示,然后联合执行图像和句子匹配以及用于模型学习的句子生成。通过在两个公共基准数据集上获得最新的结果,大量的实验证明了我们学到的语义概念和顺序的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号