Revisiting the Data Lifecycle with Big Data Curation

Pouchard Line

首页> 外文期刊>International Journal of Digital Curation >Revisiting the Data Lifecycle with Big Data Curation

【24h】

Revisiting the Data Lifecycle with Big Data Curation

机译：通过大数据策展重访数据生命周期

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

As science becomes more data-intensive and collaborative, researchers increasingly use larger and more complex data to answer research questions. The capacity of storage infrastructure, the increased sophistication and deployment of sensors, the ubiquitous availability of computer clusters, the development of new analysis techniques, and larger collaborations allow researchers to address grand societal challenges in a way that is unprecedented. In parallel, research data repositories have been built to host research data in response to the requirements of sponsors that research data be publicly available. Libraries are re-inventing themselves to respond to a growing demand to manage, store, curate and preserve the data produced in the course of publicly funded research. As librarians and data managers are developing the tools and knowledge they need to meet these new expectations, they inevitably encounter conversations around Big Data. This paper explores definitions of Big Data that have coalesced in the last decade around four commonly mentioned characteristics: volume, variety, velocity, and veracity. We highlight the issues associated with each characteristic, particularly their impact on data management and curation. We use the methodological framework of the data life cycle model, assessing two models developed in the context of Big Data projects and find them lacking. We propose a Big Data life cycle model that includes activities focused on Big Data and more closely integrates curation with the research life cycle. These activities include planning, acquiring, preparing, analyzing, preserving, and discovering, with describing the data and assuring quality being an integral part of each activity. We discuss the relationship between institutional data curation repositories and new long-term data resources associated with high performance computing centers, and reproducibility in computational science. We apply this model by mapping the four characteristics of Big Data outlined above to each of the activities in the model. This mapping produces a set of questions that practitioners should be asking in a Big Data project

机译：随着科学越来越成为数据密集型和协作型，研究人员越来越多地使用更大，更复杂的数据来回答研究问题。存储基础设施的容量，传感器的复杂性和部署的增加，计算机集群的无处不在的可用性，新分析技术的开发以及更大范围的合作，使研究人员能够以前所未有的方式应对巨大的社会挑战。同时，为了响应发起人要求公开提供研究数据的要求，已经建立了研究数据存储库来托管研究数据。图书馆正在重新发明自己，以应对管理，存储，管理和保存在公共资助的研究过程中产生的数据的日益增长的需求。图书馆员和数据管理员在开发满足这些新期望所需的工具和知识时，不可避免地会遇到围绕大数据的对话。本文探讨了过去十年中已结合的四个大特征：大数据量，多样性，速度和准确性。我们重点介绍与每个特征相关的问题，尤其是它们对数据管理和管理的影响。我们使用数据生命周期模型的方法框架，评估了在大数据项目背景下开发的两个模型，发现它们缺乏。我们提出了一个大数据生命周期模型，该模型包括针对大数据的活动，并将策展与研究生命周期更紧密地集成在一起。这些活动包括计划，获取，准备，分析，保存和发现，其中描述数据和确保质量是每个活动的组成部分。我们讨论机构数据管理存储库和与高性能计算中心相关的新的长期数据资源之间的关系，以及计算科学中的可重复性。我们通过将上面概述的大数据的四个特征映射到模型中的每个活动来应用该模型。这种映射产生了从业人员在大数据项目中应该提出的一系列问题

著录项

来源
《International Journal of Digital Curation》 |2016年第2期|共页
作者
Pouchard Line;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类农业科学;
关键词

相似文献

外文文献
中文文献
专利

1. Using the DCC Lifecycle Model to Curate a Gene Expression Database: A Case Study [J] . Oa€?Donoghue Jean, van Hemert Jano I. International Journal of Digital Curation . 2009,第3期

机译：使用DCC生命周期模型管理基因表达数据库：一个案例研究
2. Data Curation Network: A Cross-Institutional Staffing Model for Curating Research Data [J] . Lisa R Johnston, Jake Carlson, Cynthia Hudson-Vitale, International Journal of Digital Curation . 2018,第1期

机译：数据整理网络：用于整理研究数据的跨机构人员配置模型
3. Curating Research Data. Volume One: Practical Strategies for Your Digital Repository (ISBN: 978-0-8389-8858-9) and Curating Research Data. Volume Two: A Handbook of Current Practice (ISBN: 978-0-8389-8862-6). [J] . Kara Kugelmeyer College & Research Libraries . 2018,第4期

机译：策划研究数据。第一个：数字存储库的实用策略（ISBN：978-0-8389-8858-9）和策划研究数据。第2卷：目前练习手册（ISBN：978-0-8389-8862-6）。
4. Earth Science Data Management: Mapping Actual Tasks to Conceptual Actions in the Curation Lifecycle Model [C] . Bradley Wade Bishop, Carolyn Hank Transforming digital worlds . 2018

机译：地球科学数据管理：在策展生命周期模型中将实际任务映射到概念性动作
5. Amplifying Data Curation Efforts to Improve The Quality of Life Science Data [D] . Alqasab, Mariam S. 2019

机译：放大数据策择努力，以提高生命科学数据的质量
6. CPAD Curated Protein Aggregation Database: A Repository of Manually Curated Experimental Data on Protein and Peptide Aggregation [O] . A. Mary Thangakani, R. Nagarajan, Sandeep Kumar, -1

机译：CPAD固化蛋白聚集数据库：关于蛋白质和肽聚集的手动固化实验数据的存储库
7. Revisiting the Data Lifecycle with Big Data Curation [O] . Line Pouchard 2015

机译：使用大数据管理重新审视数据生命周期

Revisiting the Data Lifecycle with Big Data Curation

摘要

著录项

相似文献

相关主题

期刊订阅