首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Cross-modality matching based on Fisher Vector with neural word embeddings and deep image features
【24h】

Cross-modality matching based on Fisher Vector with neural word embeddings and deep image features

机译:基于带有神经词嵌入和深度图像特征的Fisher向量的跨模态匹配

获取原文

摘要

Cross-modal retrieval, which aims to solve the problem that the query and the retrieved results are from different modality, becomes more and more essential with the development of the Internet. In this paper, we mainly focus on the exploration of high-level semantic representation of image and text for cross-modal matching. Deep convolutional image features and Fisher Vector with neural word embeddings are utilized as visual and textual features respectively. To further investigate the correlation among heterogeneous multimodal characteristics, we use multiclass logistic classifier for semantic matching across modalities. Experiments on Wikipedia and Pascal Sentence dataset demonstrate the robustness and effectiveness for both Img2Text and Text2Img retrieval tasks.
机译:跨模式检索旨在解决查询和检索结果来自不同模态的问题,随着Internet的发展,跨模式检索变得越来越重要。在本文中,我们主要致力于探索用于跨模式匹配的图像和文本的高级语义表示。深度卷积图像特征和带有神经词嵌入的Fisher向量分别用作视觉和文本特征。为了进一步研究异构多模式特征之间的相关性,我们使用多类逻辑分类器对各种模式进行语义匹配。 Wikipedia和Pascal Sentence数据集上的实验证明了Img2Text和Text2Img检索任务的鲁棒性和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号