首页> 外文OA文献 >SubZero: A fine-grained lineage system for scientific databases
【2h】

SubZero: A fine-grained lineage system for scientific databases

机译:subZero:用于科学数据库的细粒度谱系系统

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Data lineage is a key component of provenance that helps scientists track and query relationships between input and output data. While current systems readily support lineage relationships at the file or data array level, finer-grained support at an array-cell level is impractical due to the lack of support for user defined operators and the high runtime and storage overhead to store such lineage. We interviewed scientists in several domains to identify a set of common semantics that can be leveraged to efficiently store fine-grained lineage. We use the insights to define lineage representations that efficiently capture common locality properties in the lineage data, and a set of APIs so operator developers can easily export lineage information from user defined operators. Finally, we introduce two benchmarks derived from astronomy and genomics, and show that our techniques can reduce lineage query costs by up to 10× while incuring substantially less impact on workflow runtime and storage.
机译:数据沿袭是出处的关键组成部分,可帮助科学家跟踪和查询输入和输出数据之间的关系。尽管当前的系统很容易在文件或数据阵列级别支持沿袭关系,但由于缺乏对用户定义的运算符的支持以及存储此类沿袭的高运行时间和存储开销,因此在数组单元级别进行更细粒度的支持是不切实际的。我们采访了多个领域的科学家,以确定可以用来有效存储细粒度谱系的一组常见语义。我们使用这些见解来定义沿袭表示形式,以有效地捕获沿袭数据中的常见位置属性,以及一组API,以便操作员开发人员可以轻松地从用户定义的操作员导出沿袭信息。最后,我们介绍了两个源自天文学和基因组学的基准,并表明我们的技术可以将沿袭查询成本降低多达10倍,同时确保对工作流运行时间和存储的影响要小得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号