...
首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Learning visual and textual representations for multimodal matching and classification
【24h】

Learning visual and textual representations for multimodal matching and classification

机译:学习多模式匹配和分类的视觉和文本表示

获取原文
获取原文并翻译 | 示例

摘要

Multimodal learning has been an important and challenging problem for decades, which aims to bridge the modality gap between heterogeneous representations, such as vision and language. Unlike many current approaches which only focus on either multimodal matching or classification, we propose a unified network to jointly learn multimodal matching and classification (MMC-Net) between images and texts. The proposed MMC-Net model can seamlessly integrate the matching and classification components. It first learns visual and textual embedding features in the matching component, and then generates discriminative multimodal representations in the classification component. Combining the two components in a unified model can help in improving their performance. Moreover, we present a multi-stage training algorithm by minimizing both of the matching and classification loss functions. Experimental results on four well-known multimodal benchmarks demonstrate the effectiveness and efficiency of the proposed approach, which achieves competitive performance for multimodal matching and classification compared to state-of-the-art approaches. (C) 2018 Published by Elsevier Ltd.
机译:多式化学习是几十年的重要挑战性问题,旨在弥合异构陈述之间的模态差距,例如视觉和语言。与只关注多模式匹配或分类的许多电流方法不同,我们提出了一个统一的网络,共同学习图像和文本之间的多模式匹配和分类(MMC-Net)。所提出的MMC-Net模型可以无缝集成匹配和分类组件。它首先在匹配组件中学习Visual和Textual嵌入功能,然后在分类组件中生成判别多模式表示。结合统一模型中的两个组件可以帮助提高其性能。此外,我们通过最小化匹配和分类损失函数来提出多级训练算法。四个着名的多媒体基准测试的实验结果证明了所提出的方法的有效性和效率,与最先进的方法相比,实现了多式联运匹配和分类的竞争性能。 (c)2018由elestvier有限公司出版

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号