首页> 外文会议>IEEE International Conference on Multimedia and Expo >WIKI-CMR: A web cross modality dataset for studying and evaluation of cross modality retrieval models
【24h】

WIKI-CMR: A web cross modality dataset for studying and evaluation of cross modality retrieval models

机译:WIKI-CMR:一个用于研究和评估跨模式检索模型的Web跨模式数据集

获取原文

摘要

With the popularity of Web multimedia data, cross-modality retrieval becomes an urgent and challenging problem. Bridging the semantic gap between different modalities and dealing with abundant data are the main challenges for cross-modality retrieval. A well-designed dataset could provide a platform for developing the state-of-the-art cross-modality retrieval algorithms. However, existing Web cross-modality datasets are small in size, or do not contain the full information, for example, the hyperlink structure. In this paper, we introduce a new Web cross-modality dataset called “WIKI-CMR” by selecting Wikipedia as the reliable and information-rich data resource, and collect data with a smart crawling strategy. This dataset is comprised of 74961 documents with textual paragraphs, images and hyperlinks. All documents are categorized into 11 semantic topics. We point out several challenges on this dataset and use this dataset to evaluate some well-known cross-modality retrieval models.
机译:随着Web多媒体数据的普及,跨模式检索已成为一个紧迫而具有挑战性的问题。缩小不同模式之间的语义鸿沟并处理大量数据是跨模式检索的主要挑战。精心设计的数据集可以为开发最新的交叉模式检索算法提供平台。但是,现有的Web跨模式数据集的规模很小,或者不包含完整的信息,例如,超链接结构。在本文中,我们通过选择Wikipedia作为可靠且信息丰富的数据资源,引入了一个称为“ WIKI-CMR”的新的Web跨模式数据集,并采用了智能爬网策略来收集数据。该数据集由74961个文档组成,这些文档具有文本段落,图像和超链接。所有文档都分为11个语义主题。我们指出了该数据集上的一些挑战,并使用该数据集来评估一些著名的跨模式检索模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号