Amplifying Data Curation Efforts to Improve the Quality of Life Science Data

Mariam Alqasab Suzanne M. Embury Sandra de F. Mendes Sampaio

首页> 外文期刊>International Journal of Digital Curation >Amplifying Data Curation Efforts to Improve the Quality of Life Science Data

【24h】

Amplifying Data Curation Efforts to Improve the Quality of Life Science Data

机译：扩大数据管理工作以提高生命科学数据的质量

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the era of data science, datasets are shared widely and used for many purposes unforeseen by the original creators of the data. In this context, defects in datasets can have far reaching consequences, spreading from dataset to dataset, and affecting the consumers of data in ways that are hard to predict or quantify. Some form of waste is often the result. For example, scientists using defective data to propose hypotheses for experimentation may waste their limited wet lab resources chasing the wrong experimental targets. Scarce drug trial resources may be used to test drugs that actually have little chance of giving a cure. Because of the potential real world costs, database owners care about providing high quality data. Automated curation tools can be used to an extent to discover and correct some forms of defect. However, in some areas human curation, performed by highly-trained domain experts, is needed to ensure that the data represents our current interpretation of reality accurately. Human curators are expensive, and there is far more curation work to be done than there are curators available to perform it. Tools and techniques are needed to enable the full value to be obtained from the curation effort currently available. In this paper,we explore one possible approach to maximising the value obtained from human curators, by automatically extracting information about data defects and corrections from the work that the curators do. This information is packaged in a source independent form, to allow it to be used by the owners of other databases (for which human curation effort is not available or is insufficient). This amplifies the efforts of the human curators, allowing their work to be applied to other sources, without requiring any additional effort or change in their processes or tool sets. We show that this approach can discover significant numbers of defects, which can also be found in other sources.

机译：在数据科学时代，数据集被广泛共享，并且被原始数据创建者无法预料的许多目的使用。在这种情况下，数据集中的缺陷可能会产生深远的影响，在各个数据集中扩散，并以难以预测或量化的方式影响数据的使用者。通常会导致某种形式的浪费。例如，科学家使用有缺陷的数据提出实验假设，可能会浪费有限的湿实验室资源来追求错误的实验目标。稀有的药物试验资源可用于测试实际上几乎没有机会治愈的药物。由于现实世界中潜在的成本，数据库所有者关心提供高质量的数据。自动化管理工具可以在一定程度上用于发现和纠正某些形式的缺陷。但是，在某些领域，需要由训练有素的领域专家进行人工管理，以确保数据能够准确地代表我们当前对现实的解释。人类策展人很昂贵，与要执行的策展人相比，要做的策展工作要多得多。需要使用工具和技术来从目前可用的策展工作中获得全部价值。在本文中，我们通过从策展人所做的工作中自动提取有关数据缺陷和更正的信息，探索了一种最大化从策展人那里获得的价值的可能方法。此信息以独立于源的形式打包，以允许其他数据库的所有者使用（对于这些数据库，人工管理工作不可用或不足）。这扩大了人类策展人的工作量，使他们的工作可以应用于其他资源，而无需任何额外的工作或更改其过程或工具集。我们证明了这种方法可以发现大量缺陷，也可以在其他来源中找到。

著录项

来源
《International Journal of Digital Curation》 |2017年第1期|共12页
作者
Mariam Alqasab Suzanne M. Embury Sandra de F. Mendes Sampaio;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类农业科学;
关键词
入库时间 2022-08-18 08:59:00

相似文献

外文文献
中文文献
专利

1. BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences [J] . Alejandra Gonzalez-Beltran, Allyson Lister, Eamonn Maguire, Database . 2016,第0期

机译：生物共享：生命科学中精心策划和众包的元数据标准，数据库和数据政策
2. Translational Researchers' Perceptions of Data Management Practices and Data Curation Needs: Findings from a Focus Group in an Academic Health Sciences Library [J] . TANIA P. BARDYN, TARYN RESNICK, SUSAN K. CAMINA Journal of web librarianship . 2012,第4期

机译：转化研究人员对数据管理实践和数据管理需求的看法：学术健康科学图书馆焦点小组的调查结果
3. Life in the Atacama: Science autonomy for improving data quality [J] . Trey Smith, David R. Thompson, David S. Wettergreen, Journal of Geophysical Research. Biogeosciences . 2007,第g4期

机译：阿塔卡马的生活：提高数据质量的科学自主权
4. Earth Science Data Management: Mapping Actual Tasks to Conceptual Actions in the Curation Lifecycle Model [C] . Bradley Wade Bishop, Carolyn Hank Transforming digital worlds . 2018

机译：地球科学数据管理：在策展生命周期模型中将实际任务映射到概念性动作
5. Amplifying Data Curation Efforts to Improve The Quality of Life Science Data [D] . Alqasab, Mariam S. 2019

机译：放大数据策择努力，以提高生命科学数据的质量
6. BioSharing: curated and crowd-sourced metadata standards databases and data policies in the life sciences [O] . Peter McQuilton, Alejandra Gonzalez-Beltran, Philippe Rocca-Serra, 2016

机译：生物共享：生命科学中精心策划和众包的元数据标准数据库和数据政策
7. Amplifying Data Curation Efforts to Improve the Quality of Life Science Data [O] . Alqasab, Mariam, Embury, Suzanne, Sampaio, Sandra 2017

机译：扩大数据管理工作，提高生命科学数据质量

Amplifying Data Curation Efforts to Improve the Quality of Life Science Data

摘要

著录项

相似文献

相关主题

期刊订阅