首页> 外文期刊>Data & Knowledge Engineering >RDFProv: A relational RDF store for querying and managing scientific workflow provenance
【24h】

RDFProv: A relational RDF store for querying and managing scientific workflow provenance

机译:RDFProv:用于查询和管理科学工作流程来源的关系RDF存储

获取原文
获取原文并翻译 | 示例

摘要

Provenance metadata has become increasingly important to support scientific discovery reproducibility, result interpretation, and problem diagnosis in scientific workflow environments. The provenance management problem concerns the efficiency and effectiveness of the modeling, recording, representation, integration, storage, and querying of provenance metadata. Our approach to provenance management seamlessly integrates the interoperability, extensibility, and inference advantages of Semantic Web technologies with the storage and querying power of an RDBMS to meet the emerging requirements of scientific workflow provenance management. In this paper, we elaborate on the design of a relational RDF store, called RDFProv, which is optimized for scientific workflow provenance querying and management. Specifically, we propose: ⅰ) two schema mapping algorithms to map an OWL provenance ontology to a relational database schema that is optimized for common provenance queries; ⅱ) three efficient data mapping algorithms to map provenance RDF metadata to relational data according to the generated relational database schema, and ⅲ) a schema-independent SPARQL-to-SQL translation algorithm that is optimized on-the-fly by using the type information of an instance available from the input provenance ontology and the statistics of the sizes of the tables in the database. Experimental results are presented to show that our algorithms are efficient and scalable. The comparison with two popular relational RDF stores, Jena and Sesame, and two commercial native RDF stores, AllegroGraph and BigOWLIM, showed that our optimizations result in improved performance and scalability for provenance metadata management. Finally, our case study for provenance management in a real-life biological simulation workflow showed the production quality and capability of the RDFProv system. Although presented in the context of scientific workflow provenance management, many of our proposed techniques apply to general RDF data management as well.
机译:来源元数据对于在科学工作流环境中支持科学发现的可重复性,结果解释和问题诊断变得越来越重要。出处管理问题涉及出处元数据的建模,记录,表示,集成,存储和查询的效率和有效性。我们的出处管理方法将语义Web技术的互操作性,可扩展性和推理优势与RDBMS的存储和查询功能无缝集成,从而满足科学工作流出处管理的新要求。在本文中,我们详细介绍了一种关系RDF存储(称为RDFProv)的设计,该存储已针对科学工作流来源查询和管理进行了优化。具体来说,我们提出:ⅰ)两种模式映射算法,用于将OWL起源本体映射到针对普通起源查询优化的关系数据库模式; ⅱ)三种高效的数据映射算法,可根据生成的关系数据库模式将出处的RDF元数据映射到关系数据,以及ⅲ)通过使用类型信息实时优化的与模式无关的SPARQL-to-SQL转换算法输入源本体中可用实例的实例以及数据库中表大小的统计信息。实验结果表明,我们的算法是有效和可扩展的。与两个流行的关系型RDF商店Jena和Sesame以及两个商业本机RDF商店AllegroGraph和BigOWLIM的比较表明,我们的优化可以改善源元数据管理的性能和可伸缩性。最后,我们在现实生活中的生物模拟工作流程中进行物源管理的案例研究显示了RDFProv系统的生产质量和能力。尽管是在科学的工作流程来源管理中介绍的,但我们提出的许多技术也适用于一般RDF数据管理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号