...
首页> 外文期刊>VLDB journal >The RDF-3X engine for scalable management of RDF data
【24h】

The RDF-3X engine for scalable management of RDF data

机译:RDF-3X引擎,用于可扩展的RDF数据管理

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

RDF is a data model for schema-free structured information that is gaining momentum in the context of Semantic-Web data, life sciences, and also Web 2.0 platforms. The "pay-as-you-go" nature of RDF and the flexible pattern-matching capabilities of its query language SPARQL entail efficiency and scalability challenges for complex queries including long join paths. This paper presents the RDF-3X engine, an implementation of SPARQL that achieves excellent performance by pursuing a RISC-style architecture with streamlined indexing and query processing. The physical design is identical for all RDF-3X databases regardless of their workloads, and completely eliminates the need for index tuning by exhaustive indexes for all permutations of subject-property-object triples and their binary and unary projections. These indexes are highly compressed, and the query processor can aggressively leverage fast merge joins with excellent performance of processor caches. The query optimizer is able to choose optimal join orders even for complex queries, with a cost model that includes statistical synopses for entire join paths. Although RDF-3X is optimized for queries, it also provides good support for efficient online updates by means of a staging architecture: direct updates to the main database indexes are deferred, and instead applied to compact differential indexes which are later merged into the main indexes in a batched manner. Experimental studies with several large-scale datasets with more than 50 million RDF triples and benchmark queries that include pattern matching, many way star-joins, and long path-joins demonstrate that RDF-3X canrnoutperform the previously best alternatives by one or two orders of magnitude.
机译:RDF是一种用于无模式结构化信息的数据模型,在语义Web数据,生命科学以及Web 2.0平台的背景下,该模型正日渐流行。 RDF的“即付即用”性质及其查询语言SPARQL的灵活模式匹配功能给包括长联接路径在内的复杂查询带来了效率和可伸缩性方面的挑战。本文介绍了RDF-3X引擎,它是一种SPARQL的实现,它通过追求具有简化索引和查询处理的RISC风格体系结构而获得了出色的性能。所有RDF-3X数据库的物理设计都是相同的,而不管它们的工作量如何,并且完全消除了通过穷举索引来调整主题-属性-对象三元组及其二进制和一元投影的详尽索引的需要。这些索引经过高度压缩,查询处理器可以积极利用快速合并联接以及出色的处理器缓存性能。查询优化器甚至可以为复杂查询选择最佳的连接顺序,其成本模型包括整个连接路径的统计概要。尽管RDF-3X针对查询进行了优化,但它还通过分段体系结构为有效的在线更新提供了良好的支持:推迟对主数据库索引的直接更新,而是将其应用于紧凑的差异索引,然后将这些差异索引合并为主要索引分批地。对具有5,000万个RDF三元组的几个大规模数据集进行的实验研究以及包括模式匹配,多方式星形连接和长路径连接在内的基准查询表明,RDF-3X的性能比以前最好的替代方法高一到两个数量级。大小。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号