首页> 外文会议>International Conference on Pattern Recognition >Beyond the Deep Metric Learning: Enhance the Cross-Modal Matching with Adversarial Discriminative Domain Regularization
【24h】

Beyond the Deep Metric Learning: Enhance the Cross-Modal Matching with Adversarial Discriminative Domain Regularization

机译:超越深度度量学习:增强与对抗鉴别域正则化的跨模型匹配

获取原文

摘要

Matching information across image and text modalities is a fundamental challenge for many applications that involve both vision and natural language processing. The objective is to find efficient similarity metrics to compare the similarity between visual and textual information. Existing approaches mainly match the local visual objects and the sentence words in a shared space with attention mechanisms. The matching performance is still limited because the similarity computation is based on simple comparisons of the matching features, ignoring the characteristics of their distribution in the data. In this paper, we address this limitation with an efficient learning objective that considers the discriminative feature distributions between the visual objects and sentence words. Specifically, we propose a novel Adversarial Discriminative Domain Regularization (ADDR) learning framework, beyond the paradigm metric learning objective, to construct a set of discriminative data domains within each image-text pairs. Our approach can generally improve the learning efficiency and the performance of existing metrics learning frameworks by regulating the distribution of the hidden space between the matching pairs. The experimental results show that this new approach significantly improves the overall performance of several popular cross-modal matching techniques (SCAN [13], VSRN [14], BFAN [15]) on the MS-COCO and Flickr30K benchmarks.
机译:跨图像和文本方式的匹配信息对于许多涉及视觉和自然语言处理的许多应用程序是一个根本的挑战。目标是找到有效的相似性指标,以比较视觉和文本信息之间的相似性。现有方法主要与本地视觉对象和共享空间中的句子单词与注意机制匹配。匹配性能仍然有限,因为相似性计算是基于匹配功能的简单比较,忽略其在数据中分发的特征。在本文中,我们通过高效学习目标来解决这些限制,该限制考虑了视觉对象和句子词之间的鉴别特征分布。具体地,我们提出了一种新的对抗鉴别域正规化(Addr)学习框架,超出范式公制学习目标,以在每个图像文本对中构造一组判别数据域。我们的方法通常可以通过调节匹配对之间的隐藏空间的分布来提高现有度量学习框架的学习效率和性能。实验结果表明,这种新方法显着提高了MS-Coco和FlickR30K基准测试的几种流行的跨模型匹配技术的整体性能(扫描[13],VSRN [14],BFAN [15])。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号