Complementarity is the king: Multi-modal and multi-grained hierarchical semantic enhancement network for cross-modal retrieval

Xinlei Pei; Zheng Liu; Shanshan GaoYijun Su

首页> 外文期刊>Expert Systems with Application >Complementarity is the king: Multi-modal and multi-grained hierarchical semantic enhancement network for cross-modal retrieval

【24h】

Complementarity is the king: Multi-modal and multi-grained hierarchical semantic enhancement network for cross-modal retrieval

机译：互补性为王：面向跨模态检索的多模态多粒度分层语义增强网络

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

AI期刊论文写作 >>

页面导航

摘要
著录项
引文网络
相关主题

摘要

Cross-modal retrieval takes a query of one modality to retrieve relevant results from another modality, and its key issue lies in how to learn the cross-modal similarity. Note that the complete semantic information of a specific concept is widely scattered over the multi-modal and multi-grained data, and it cannot be thoroughly captured by most existing methods to learn the cross-modal similarity accurately. Therefore, we propose a Multi-modal and Multi-grained Hierarchical Semantic Enhancement network (M~2HSE), which contains two stages to obtain more complete semantic information by fusing the complementarity in multi-modal and multi-grained data. In stage 1, two classes of cross-modal similarity (primary similarity and auxiliary similarity) are calculated more comprehensively in two subnetworks. Especially, the primary similarities from two subnetworks are fused to perform the cross-modal retrieval, while the auxiliary similarity provides a valuable complement for the primary similarity. In stage 2, the multi-spring balance loss is proposed to optimize the cross-modal similarity more flexibly. Utilizing this loss, the most representative samples are selected to establish the multi-spring balance system, which adaptively optimizes the cross-modal similarities until reaching the equilibrium state. Extensive experiments conducted on public benchmark datasets clearly prove the effectiveness of our proposed method and show its competitive performance with the state-of-the-arts.

机译：跨模态检索需要对一种模态进行查询，从另一种模态中检索相关结果，其关键问题在于如何学习跨模态相似性。需要注意的是，特定概念的完整语义信息广泛分散在多模态、多粒度的数据中，大多数现有方法无法准确学习跨模态相似性。因此，我们提出了一种多模态多粒度的分层语义增强网络（M~2HSE），该网络包含两个阶段，通过融合多模态和多粒度数据中的互补性来获得更完整的语义信息。在第 1 阶段，在两个子网中更全面地计算了两类跨模态相似性（主要相似性和辅助相似性）。特别是，融合了两个子网的主要相似性进行跨模态检索，而辅助相似性则为主要相似性提供了有价值的补充。在第二阶段，提出多弹簧平衡损失，以更灵活地优化跨模态相似性。利用这种损失，选择最具代表性的样本建立多弹簧平衡系统，自适应地优化跨模态相似性，直至达到平衡状态。在公共基准数据集上进行的大量实验清楚地证明了我们提出的方法的有效性，并显示了其与最先进的竞争性能。

著录项

来源
《Expert Systems with Application》 |2023年第4期|119415.1-119415.21|共21页
作者
Xinlei Pei; Zheng Liu; Shanshan GaoYijun Su;
展开▼
作者单位

School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, 250014, Shandong, China, Shandong Provincial Key Laboratory of Digital Media Technology, Shandong University of Finance and Economics, Jinan, 250014, Shandong,;

School of Information Engineering, Minzu University of China, Beijing, 100081, China;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类
关键词
Cross-modal retrieval; Primary similarity; Auxiliary similarity; Semantic enhancement; Multi-spring balance loss;

机译：跨模态检索;主要相似性;辅助相似性;语义增强;多弹簧平衡损失;

Complementarity is the king: Multi-modal and multi-grained hierarchical semantic enhancement network for cross-modal retrieval

摘要

著录项

引文网络

相关主题

期刊订阅