...
首页> 外文期刊>Knowledge and information systems >Dynamic and fast processing of queries on large-scale RDF data
【24h】

Dynamic and fast processing of queries on large-scale RDF data

机译:动态和快速处理大规模RDF数据的查询

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

As RDF data continue to gain popularity, we witness the fast growing trend of RDF datasets in both the number of RDF repositories and the size of RDF datasets. Many known RDF datasets contain billions of RDF triples (subject, predicate and object). One of the grant challenges for managing these huge RDF data is how to execute RDF queries efficiently. In this paper, we address the query processing problems against the billion triple challenges. We first identify some causes for the problems of existing query optimization schemes, such as large intermediate results, initial query cost estimation errors. Then, we present our block-oriented dynamic query plan generation approach powered with pipelining execution. Our approach consists of two phases. In the first phase, a near-optimal execution plan for queries is chosen by identifying the processing blocks of queries. We group the join patterns sharing a join variable into building blocks of the query plan since executing them first provides opportunities to reduce the size of intermediate results generated. In the second phase, we further optimize the initial pipelining for a given query plan. We employ optimization techniques, such as sideways information passing and semi-join, to further reduce the size of intermediate results, improve the query processing cost estimation and speed up the performance of query execution. Experimental results on several RDF datasets of over a billion triples demonstrate that our approach outperforms existing RDF query engines that rely on dynamic programming based static query processing strategies.
机译:随着RDF数据继续获得普及,我们见证了RDF数据集在RDF存储库数量和RDF数据集大小方面的快速增长趋势。许多已知的RDF数据集包含数十亿个RDF三元组(主题,谓词和对象)。管理这些庞大的RDF数据的挑战之一是如何有效地执行RDF查询。在本文中,我们针对十亿个三重挑战解决了查询处理问题。我们首先确定造成现有查询优化方案问题的一些原因,例如较大的中间结果,初始查询成本估算错误。然后,我们提出了基于流水线执行的面向块的动态查询计划生成方法。我们的方法包括两个阶段。在第一阶段,通过标识查询的处理块来选择查询的最佳执行计划。我们将共享一个联接变量的联接模式分组到查询计划的构建块中,因为首先执行它们会提供机会来减小生成的中间结果的大小。在第二阶段,我们进一步优化了给定查询计划的初始流水线。我们采用横向信息传递和半联接等优化技术,以进一步减小中间结果的大小,提高查询处理成本的估计并加快查询执行的性能。在超过十亿个三元组的几个RDF数据集上的实验结果表明,我们的方法优于依赖于基于动态编程的静态查询处理策略的现有RDF查询引擎。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号