A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge

机译：生物医学数据集检索的公开基准：2016 bioCADDIE数据集检索挑战的参考标准

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

The rapid proliferation of publicly available biomedical datasets has provided abundant resources that are potentially of value as a means to reproduce prior experiments, and to generate and explore novel hypotheses. However, there are a number of barriers to the re-use of such datasets, which are distributed across a broad array of dataset repositories, focusing on different data types and indexed using different terminologies. New methods are needed to enable biomedical researchers to locate datasets of interest within this rapidly expanding information ecosystem, and new resources are needed for the formal evaluation of these methods as they emerge. In this paper, we describe the design and generation of a benchmark for information retrieval of biomedical datasets, which was developed and used for the 2016 bioCADDIE Dataset Retrieval Challenge. In the tradition of the seminal Cranfield experiments, and as exemplified by the Text Retrieval Conference (TREC), this benchmark includes a corpus (biomedical datasets), a set of queries, and relevance judgments relating these queries to elements of the corpus. This paper describes the process through which each of these elements was derived, with a focus on those aspects that distinguish this benchmark from typical information retrieval reference sets. Specifically, we discuss the origin of our queries in the context of a larger collaborative effort, the biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium, and the distinguishing features of biomedical dataset retrieval as a task. The resulting benchmark set has been made publicly available to advance research in the area of biomedical dataset retrieval. >Database URL:

机译：公众可获得的生物医学数据集的迅速扩散提供了丰富的资源，这些资源具有潜力，可以作为重现先前实验以及生成和探索新假设的手段。但是，重用此类数据集存在许多障碍，这些障碍分布在广泛的数据集存储库中，着重于不同的数据类型并使用不同的术语进行索引。需要新的方法来使生物医学研究人员能够在这个迅速扩展的信息生态系统中定位感兴趣的数据集，并且需要新的资源来对这些方法的出现进行正式评估。在本文中，我们描述了生物医学数据集信息检索基准的设计和生成，该基准是针对2016 bioCADDIE数据集检索挑战而开发和使用的。按照开创性的Cranfield实验的传统，并以文本检索会议（TREC）为例，该基准测试包括一个语料库（生物医学数据集），一组查询以及将这些查询与语料库元素相关联的相关性判断。本文介绍了导出这些元素的过程，重点介绍了将基准与典型信息检索参考集区分开的那些方面。具体来说，我们在更大的协作努力，生物医学和healthCAre数据发现索引生态系统（bioCADDIE）联盟以及生物医学数据集检索的显着特征这一任务下讨论了查询的起源。由此产生的基准集已公开提供，以推进生物医学数据集检索领域的研究。 >数据库网址：

著录项

期刊名称 Database: The Journal of Biological Databases and Curation
作者
Trevor Cohen; Kirk Roberts; Anupama E. Gururaj; Xiaoling Chen; Saeid Pournejati; George Alter; William R. Hersh; Dina Demner-Fushman; Lucila Ohno-Machado; Hua Xu;
展开▼
作者单位

展开▼
年(卷),期 2017(2017),-1
年度 2017
页码 bax061
总页数 10
原文格式 PDF
正文语种
中图分类生物学;
关键词

相似文献

外文文献
中文文献
专利

1. Two novel benchmark datasets from ArcGIS and bing world imagery for remote sensing image retrieval [J] . Hou Dongyang, Miao Zelang, Xing Huaqiao, International journal of remote sensing . 2021,第1a2期

机译：来自ArcGIS和Bing World Imagery的两种新颖的基准数据集，用于遥感图像检索
2. V-RSIR: A WEB-BASED TOOL AND BENCHMARK DATASET FOR REMOTE SENSING IMAGE RETRIEVAL [J] . Hou D., Xing H. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences . 2019,第1期

机译：V-RSIR：基于Web的工具和基准数据集，用于遥感图像检索
3. PatternNet: A benchmark dataset for performance evaluation of remote sensing image retrieval [J] . Zhou Weixun, Newsam Shawn, Li Congmin, ISPRS Journal of Photogrammetry and Remote Sensing . 2018,第NOVa期

机译：PatternNet：用于评估遥感图像检索性能的基准数据集
4. METU dataset: A big dataset for benchmarking trademark retrieval [C] . Tursun Osman, Kalkan Sinan IAPR International Conference on Machine Vision Applications . 2015

机译：METU数据集：一个用于基准商标检索的大型数据集
5. Information Retrieval in Biomedical Research: From Articles to Datasets [D] . Wei, Wei. 2017

机译：生物医学研究中的信息检索：从文章到数据集
6. Query expansion using MeSH terms for dataset retrieval: OHSU at the bioCADDIE 2016 dataset retrieval challenge [O] . Theodore B Wright, David Ball, William Hersh 2017

机译：使用MeSH术语进行数据集检索的查询扩展：OHSU在bioCADDIE 2016数据集检索挑战中
7. Google Landmarks Dataset v2 – A Large-Scale Benchmark for Instance-Level Recognition and Retrieval [O] . Tobias Weyand, Andre Araujo, Bingyi Cao, 2020

机译：Google地标Dataset V2 - 例如级别识别和检索的大规模基准

A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge

摘要

著录项

相似文献

相关主题

期刊订阅