首页> 外文会议>IEEE International Conference on e-Science >Datatrack: An R package for managing data in a multi-stage experimental workflow data versioning and provenance considerations in interactive scripting

【24h】

Datatrack: An R package for managing data in a multi-stage experimental workflow data versioning and provenance considerations in interactive scripting

机译：DataTrack：用于管理多级实验工作流数据版本控制中的数据的R包，并在交互式脚本中取消考虑

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In experimental research using computation, a workflow is a sequence of steps involving some data processing or analysis where the output of one step may be used as the input of another. The processing steps may involve user-supplied parameters, that when modified, result in a new version of input to the downstream steps, in turn generating new versions of their own output. As more experimentation is done, the results of these various steps can become numerous. It is important to keep track of which data output is dependent on which other generated data, and which parameters were used. In many situations, scientific workflow management systems solve this problem, but these systems are best suited to collaborative, distributed experiments using a variety of services, possibly batch processing parameter sweeps. This paper presents an R package for managing and navigating a network of interdependent data. It is intended as a lightweight tool that provides some visual data provenance information to the experimenter to allow them to manage their generated data as they run experiments within their familiar scripting environment, where it may not be desirable to commit to a fully-blown comprehensive workflow manager. The package consists of wrapper functions for writing and reading output data that can be called from within the R analysis scripts, as well as a visualization of the data-output dependency graph rendered within the R-studio console. Thus, it offers benefit to the experimenter while requiring minimal commitment for integration in their existing working environment.

机译：在使用计算的实验研究中，工作流程是涉及一些数据处理或分析的一系列步骤，其中一个步骤的输出可以用作另一个的输入。处理步骤可能涉及用户提供的参数，即在修改时，导致新版本的输入到下游步骤，反过来生成其自己的输出的新版本。随着更多的实验完成，这些各个步骤的结果可能变得众多。重要的是要跟踪哪些数据输出取决于哪些生成的数据以及使用哪些参数。在许多情况下，科学工作流管理系统解决了这个问题，但这些系统最适合使用各种服务的协作，分布式实验，可能批量处理参数扫描。本文介绍了用于管理和导航相互依存数据网络的R包。它旨在作为一种轻量级工具，为实验者提供一些可视数据的出处信息，以允许它们管理其生成的数据，因为它们在其熟悉的脚本环境中运行实验，在那里可能不希望提交完全吹入的综合工作流程经理。该软件包由包装器函数组成，用于写入和读取可以从R分析脚本中调用的输出数据，以及在R-Studio控制台中呈现的数据输出依赖性图的可视化。因此，它为实验者提供了益处，同时需要最小的致力于在其现有的工作环境中的整合。

著录项

来源
《IEEE International Conference on e-Science》|2016年|451p|共8页
会议地点
作者
Philip Eichinski; Paul Roe;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Metadata; Standards; Workflow management software; Data processing; Collaboration; Writing; Data visualization;

机译：元数据;标准;工作流管理软件;数据处理;协作;写作;数据可视化;

相似文献

外文文献
中文文献
专利

1. Datastorr: a workflow and package for delivering successive versions of 'evolving data' directly into R [J] . Falster Daniel S, FitzJohn Richard G, Pennell Matthew W, GigaScience . 2019,第5期

机译：Datastorr：一个工作流程和软件包，用于将“不断发展的数据”的后续版本直接传递到R中
2. Managing data provenance for bioinformatics workflows using AProvBio [J] . Rodrigo Almeida, Waldeyr Mendes Cordeiro Da Silva, Klayton Castro, International journal of computational biology and drug design . 2019,第2期

机译：使用Aprovbio管理生物信息学工作流的数据出处
3. Collaborative filtering over evolution provenance data for interactive visual data exploration [J] . Ben Lahmar Houssem, Herschel Melanie Information Systems . 2021,第Jana期

机译：协同过滤在演进性上的交互式视觉数据探索的分解数据
4. Datatrack: An R package for managing data in a multi-stage experimental workflow data versioning and provenance considerations in interactive scripting [C] . Philip Eichinski, Paul Roe IEEE International Conference on e-Science . 2016

机译：Datatrack：R包，用于管理多阶段实验工作流中的数据，交互式脚本中的数据版本控制和出处注意事项
5. Querying and managing Semantic Web data and Scientific Workflow Provenance using relational databases [D] . Chebotko, Artem 2008

机译：使用关系数据库查询和管理语义Web数据和科学工作流出处
6. Datastorr: a workflow and package for delivering successive versions of evolving data directly into R [O] . Daniel S Falster, Richard G FitzJohn, Matthew W Pennell, 2019

机译：Datastorr：一个工作流和软件包用于将连续的不断发展的数据版本直接传递到R中
7. Datatrack: An R package for managing data in a multi-stage experimental workflow: data versioning and provenance considerations in interactive scripting [O] . Eichinski Philip, Roe Paul 2016

机译：数据跟踪：一个用于在多阶段实验工作流中管理数据的R包：交互式脚本中的数据版本控制和出处注意事项

Datatrack: An R package for managing data in a multi-stage experimental workflow data versioning and provenance considerations in interactive scripting

摘要

著录项

相似文献

相关主题

期刊订阅