首页> 外文会议>2010 5th Workshop on Workflows in Support of Large-Scale Science >Linking multiple workflow provenance traces for interoperable collaborative science
【24h】

Linking multiple workflow provenance traces for interoperable collaborative science

机译:链接多个工作流程出处跟踪以实现可互操作的协作科学

获取原文

摘要

Scientific collaboration increasingly involves data sharing between separate groups. We consider a scenario where data products of scientific workflows are published and then used by other researchers as inputs to their workflows. For proper interpretation, shared data must be complemented by descriptive metadata. We focus on provenance traces, a prime example of such metadata which describes the genesis and processing history of data products in terms of the computational workflow steps. Through the reuse of published data, virtual, implicitly collaborative experiments emerge, making it desirable to compose the independently generated traces into global ones that describe the combined executions as single, seamless experiments. We present a model for provenance sharing that realizes this holistic view by overcoming the various interoperability problems that emerge from the heterogeneity of workflow systems, data formats, and provenance models. At the heart lie (i) an abstract workflow and provenance model in which (ii) data sharing becomes itself part of the combined workflow. We then describe an implementation of our model that we developed in the context of the Data Observation Network for Earth (DataONE) project and that can “stitch together” traces from different Kepler and Taverna workflow runs. It provides a prototypical framework for seamless cross-system, collaborative provenance management and can be easily extended to include other systems. Our approach also opens the door to new ways of workflow interoperability not only through often elusive workflow standards but through shared provenance information from public repositories.
机译:科学协作越来越多地涉及到各个小组之间的数据共享。我们考虑一种情况,其中发布科学工作流程的数据产品,然后由其他研究人员用作其工作流程的输入。为了正确解释,共享数据必须辅以描述性元数据。我们着重介绍起源痕迹,这是此类元数据的主要示例,它根据计算工作流程步骤描述了数据产品的起源和处理历史。通过重用已发布的数据,虚拟的,隐式的协作实验应运而生,这使得将独立生成的迹线组合成全局迹线,从而将组合执行描述为单个无缝实验是合乎需要的。我们提出了一种物产共享模型,该模型通过克服工作流程系统,数据格式和物产模型异质性所引起的各种互操作性问题,来实现这种整体观点。核心在于(i)抽象的工作流程和出处模型,其中(ii)数据共享本身成为组合工作流程的一部分。然后,我们描述了我们在地球数据观测网络(DataONE)项目的背景下开发的模型的实现,该模型可以“缝合”来自不同开普勒和塔韦纳工作流运行的跟踪。它为无缝的跨系统协作式物源管理提供了一个原型框架,并且可以轻松扩展为包括其他系统。我们的方法不仅通过通常难以捉摸的工作流程标准,而且通过来自公共存储库的共享出处信息,为工作流程互操作性的新方法打开了大门。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号