首页> 外文会议>Scientific and statistical database management >Provenance Context Entity (PaCE): Scalable Provenance Tracking for Scientific RDF Data
【24h】

Provenance Context Entity (PaCE): Scalable Provenance Tracking for Scientific RDF Data

机译:来源上下文实体(PaCE):科学RDF数据的可扩展来源跟踪

获取原文
获取原文并翻译 | 示例

摘要

The Resource Description Framework (RDF) format is being used by a large number of scientific applications to store and disseminate their datasets. The provenance information, describing the source or lineage of the datasets, is playing an increasingly significant role in ensuring data quality, computing trust value of the datasets, and ranking query results. Current provenance tracking approaches using the RDF reification vocabulary suffer from a number of known issues, including lack of formal semantics, use of blank nodes, and application-dependent interpretation of reified RDF triples. In this paper, we introduce a new approach called Provenance Context Entity (PaCE) that uses the notion of provenance context to create provenance-aware RDF triples. We also define the formal semantics of PaCE through a simple extension of the existing RDF(S) semantics that ensures compatibility of PaCE with existing Semantic Web tools and implementations. We have implemented the PaCE approach in the Biomedical Knowledge Repository (BKR) project at the US National Library of Medicine. The evaluations demonstrate a minimum of 49% reduction in total number of provenance-specific RDF triples generated using the PaCE approach as compared to RDF reification. In addition, performance for complex queries improves by three orders of magnitude and remains comparable to the RDF reification approach for simpler provenance queries.
机译:许多科学应用程序都使用资源描述框架(RDF)格式来存储和分发其数据集。描述数据集的来源或沿袭的出处信息在确保数据质量,计算数据集的信任值以及对查询结果进行排名中起着越来越重要的作用。当前使用RDF验证词汇的出处跟踪方法存在许多已知问题,包括缺乏形式语义,空白节点的使用以及对RDF三元组的依赖于应用程序的解释。在本文中,我们介绍了一种称为“来源上下文实体”(PaCE)的新方法,该方法使用来源上下文的概念来创建可识别来源的RDF三元组。我们还通过简单扩展现有RDF(S)语义来定义PaCE的形式语义,以确保PaCE与现有语义Web工具和实现的兼容性。我们已经在美国国家医学图书馆的生物医学知识存储库(BKR)项目中实施了PaCE方法。评估表明,与RDF精化相比,使用PaCE方法生成的特定于源的RDF三元组总数至少减少了49%。此外,复杂查询的性能提高了三个数量级,并且与用于更简单来源查询的RDF验证方法保持可比性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号