Generative label fused network for image–text matching

Zhao G.; Zhang C.; Shang H.Zhu L.Wang Y.Qian X.

首页> 外文期刊>Knowledge-based systems >Generative label fused network for image–text matching

【24h】

Generative label fused network for image–text matching

机译：Generative label fused network for image–text matching

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

? 2023 Elsevier B.V.Although there is a long line of research on bidirectional image–text matching, the problem remains a challenge due to the well-known semantic gap between visual and textual modalities. Popular solutions usually first detect the objects and then find the association between visual objects and the textual words to estimate the relevance; however, these methods only focus on the visual object features while ignoring the semantic attributions of the detected regions, which is an important clue in terms of bridging the semantic gap. To remedy this issue, we propose a generative multiattribution tag fusion method that further includes region attribution to alleviate the semantic gap. In particular, our method comprises three steps: the extraction of image features, the extraction of text features, and the matching of image and text by an attention mechanism. We first divide the image into blocks to obtain the region image features and region attribute labels. Then, we fuse them to reduce the semantic gap between the image features and text features. Second, BERT and bi-GRU are used to extract text features, and third, we use the attention mechanism to match each area in the image with each word in the text with the same meaning. The quantitative and qualitative results on the public datasets Flickr30K and MS-COCO demonstrate the effectiveness of our method. The source code is released on Github https://github.com/smileslabsh/Generative-Label-Fused-Network.

著录项

来源
《Knowledge-based systems》 |2023年第5期|1.1-1.13|共13页
作者
Zhao G.; Zhang C.; Shang H.Zhu L.Wang Y.Qian X.;
展开▼
作者单位

School of Software Engineering Xi'an Jiaotong University;

Key Laboratory for Intelligent Networks and Network Security Ministry of Education and the SMILES LAB Xi'an Jiaotong University;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类
关键词
Cross-domain; Cross-modal retrieval; Feature fusion; Image–text matching;

Generative label fused network for image–text matching

摘要

著录项

相关主题

期刊订阅