...
首页> 外文期刊>Knowledge-based systems >Generative label fused network for image–text matching
【24h】

Generative label fused network for image–text matching

机译:Generative label fused network for image–text matching

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

? 2023 Elsevier B.V.Although there is a long line of research on bidirectional image–text matching, the problem remains a challenge due to the well-known semantic gap between visual and textual modalities. Popular solutions usually first detect the objects and then find the association between visual objects and the textual words to estimate the relevance; however, these methods only focus on the visual object features while ignoring the semantic attributions of the detected regions, which is an important clue in terms of bridging the semantic gap. To remedy this issue, we propose a generative multiattribution tag fusion method that further includes region attribution to alleviate the semantic gap. In particular, our method comprises three steps: the extraction of image features, the extraction of text features, and the matching of image and text by an attention mechanism. We first divide the image into blocks to obtain the region image features and region attribute labels. Then, we fuse them to reduce the semantic gap between the image features and text features. Second, BERT and bi-GRU are used to extract text features, and third, we use the attention mechanism to match each area in the image with each word in the text with the same meaning. The quantitative and qualitative results on the public datasets Flickr30K and MS-COCO demonstrate the effectiveness of our method. The source code is released on Github https://github.com/smileslabsh/Generative-Label-Fused-Network.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号