首页> 外文期刊>Distributed and Parallel Databases >MTCProv: a practical provenance query framework for many-task scientific computing
【24h】

MTCProv: a practical provenance query framework for many-task scientific computing

机译:MTCProv:适用于多任务科学计算的实用出处查询框架

获取原文
获取原文并翻译 | 示例

摘要

Scientific research is increasingly assisted by computer-based experiments. Such experiments are often composed of a vast number of loosely-coupled computational tasks that are specified and automated as scientific workflows. This large scale is also characteristic of the data that flows within such "many-task" computations (MTC). Provenance information can record the behavior of such computational experiments via the lineage of process and data artifacts. However, work to date has focused on lineage data models, leaving unsolved issues of recording and querying other aspects, such as domain-specific information about the experiments, MTC behavior given by resource consumption and failure information, or the impact of environment on performance and accuracy. In this work we contribute with MTCProv, a provenance query framework for many-task scientific computing that captures the runtime execution details of MTC workflow tasks on parallel and distributed systems, in addition to standard prospective and data derivation provenance. To help users query provenance data we provide a high level interface that hides relational query complexities. We evaluate MTCProv using an application in protein science, and describe how important query patterns such as correlations between provenance, runtime data, and scientific parameters are simplified and expressed.
机译:基于计算机的实验越来越有助于科学研究。这样的实验通常由大量松散耦合的计算任务组成,这些任务被指定为科学工作流并自动进行。这种大规模也是这种“多任务”计算(MTC)中流动的数据的特征。来源信息可以通过过程和数据工件的沿袭来记录此类计算实验的行为。但是,迄今为止的工作集中在沿袭数据模型上,留下了记录和查询其他方面的未解决问题,例如有关实验的特定领域信息,资源消耗和故障信息给出的MTC行为,或者环境对性能和性能的影响。准确性。在这项工作中,我们使用MTCProv(一个用于多任务科学计算的出处查询框架)做出了贡献,除了标准的预期和数据派生出处之外,该框架还捕获了并行和分布式系统上MTC工作流任务的运行时执行细节。为了帮助用户查询出处数据,我们提供了一个隐藏关系查询复杂性的高级界面。我们使用蛋白质科学中的应用程序对MTCProv进行评估,并描述如何简化和表达重要的查询模式(例如出处,运行时数据和科学参数之间的相关性)。

著录项

  • 来源
    《Distributed and Parallel Databases》 |2012年第6期|p.351-370|共20页
  • 作者单位

    Computer Engineering Program, COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil,National Laboratory for Scientific Computing, Petropolis, Brazil;

    Mathematics and Computer Science Division, Argonne National Laboratory, Chicago, USA,Mathematics and Computer Science Division, Argonne National Laboratory, Chicago, USA;

    Computer Engineering Program, COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil;

    Mathematics and Computer Science Division, Argonne National Laboratory, Chicago, USA,Computation Institute, Argonne National Laboratory and University of Chicago, Chicago, USA,Department of Computer Science, University of Chicago, Chicago, USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    provenance; many-task computing; database queries; parallel and distributed computing;

    机译:出处多任务计算;数据库查询;并行和分布式计算;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号