【24h】

Bidirectional Retrieval Made Simple

机译:双向检索使得简单

获取原文

摘要

This paper provides a very simple yet effective character-level architecture for learning bidirectional retrieval models. Aligning multimodal content is particularly challenging considering the difficulty in finding semantic correspondence between images and descriptions. We introduce an efficient character-level inception module, designed to learn textual semantic embeddings by convolving raw characters in distinct granularity levels. Our approach is capable of explicitly encoding hierarchical information from distinct base-level representations (e.g., characters, words, and sentences) into a shared multimodal space, where it maps the semantic correspondence between images and descriptions via a contrastive pairwise loss function that minimizes order-violations. Models generated by our approach are far more robust to input noise than state-of-the-art strategies based on word-embeddings. Despite being conceptually much simpler and requiring fewer parameters, our models outperform the state-of-the-art approaches by 4.8% in the task of description retrieval and 2.7% (absolute R@1 values) in the task of image retrieval in the popular MS COCO retrieval dataset. We also show that our models present solid performance for text classification, specially in multilingual and noisy domains.
机译:本文提供了一种非常简单而有效的字符级架构,用于学习双向检索模型。考虑难以在图像和描述之间找到语义对应的困难,对齐多模式内容尤其具有挑战性。我们介绍了一个高效的字符级inception模块,旨在通过在不同的粒度水平中卷积原始字符来学习文本语义嵌入。我们的方法能够明确地将来自不同基本级别表示的分层信息(例如,字符,单词和句子)分成共​​享多模式空间,在那里它通过最小化顺序的对比对损耗函数映射图像和描述之间的语义对应关系 - 膜。我们方法生成的模型对于输入噪声的更强大,而不是基于Word-Embeddings的最先进的策略。尽管参数概念性更简单,但参数较少,但我们的车型优于最先进的方法,在描述中的描述检索和2.7%(绝对R @ 1值)中的任务中的最先进的方法。 MS COCO检索数据集。我们还表明,我们的模型为文本分类提供了坚实的性能,特别是多语言和嘈杂的域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号