...
首页> 外文期刊>Information Systems >GoFast: Graph-based optimization for efficient and scalable query evaluation
【24h】

GoFast: Graph-based optimization for efficient and scalable query evaluation

机译:Gofast:基于图的高效和可扩展查询评估的优化

获取原文
获取原文并翻译 | 示例

摘要

The popularity of the Resource Description Framework (RDF) and SPARQL has thrust the development of high-performance systems to manage data represented with this model. Former approaches adapted the well-established relational model applying its storage, query processing, and optimization strategies. However, the borrowed techniques from the relational model are not universally applicable in the RDF context. First, the schema-free nature of RDF induces intensive joins overheads. Also, optimization strategies trying to find the optimal join order rely on error-prone statistics unable to capture all the correlations among triples. Graph-based approaches keep the graph structure of RDF representing the data directly as a graph. Their execution model leans on graph exploration operators to find subgraph matches to a query. Even if they have shown to outperform relational-based systems in complex queries, they are barely scalable and optimization techniques are completely system dependent. Recently, some systems such as RDF_QDAG have shown that by combining graph exploration and triples clustering one can achieve a good compromise between performance and scalability. In this paper, we propose optimization strategies for this kind of RDF management systems. First, we define novel statistics collected for clusters of triples to better capture the dependencies found in the original graph. Second, we redefine an execution plan based on these logical structures which allows to represent the RDF graph exploration process. Third, we introduce an algorithm for selecting the optimal execution plan based on a customized cost model. Finally, we propose a new approach to refine the chosen plan by pruning invalid clusters that do not participate in the construction of the final query results. All our proposals are validated experimentally using well-known RDF benchmarks. (C) 2021 Elsevier Ltd. All rights reserved.
机译:资源描述框架(RDF)和SPARQL的普及推动了高性能系统的开发,以管理使用此模型表示的数据。前方法适用于应用其存储,查询处理和优化策略的良好关系模型。但是,来自关系模型的借用技术在RDF上下文中并不普遍适用。首先,RDF的无模式性质诱导密集的连接架空。此外,尝试找到最佳连接顺序的优化策略依赖于容易出错的统计信息无法捕获三元之间的所有相关性。基于图形的方法将RDF的图形结构直接视为图形。他们的执行模型倾向于图形探索运算符,以找到对查询的子图匹配。即使它们已在复杂查询中显示基于关系的基于关系的系统,它们也几乎不能扩展,优化技术完全依赖于系统。最近,一些如RDF_QDAG的系统已经表明,通过组合图形探索和三元组聚类,可以在性能和可扩展性之间实现良好的折衷。在本文中,我们提出了这种RDF管理系统的优化策略。首先,我们定义收集的小组统计,以便进行三元组的集群,以更好地捕获原始图中的依赖项。其次,我们根据这些逻辑结构重新定义执行计划,允许表示RDF图探索过程。第三,我们介绍了一种基于定制成本模型选择最佳执行计划的算法。最后,我们提出了一种通过修剪未参与最终查询结果建设的无效群集来完善所选计划的新方法。我们所有的建议都使用着名的RDF基准进行了实验验证。 (c)2021 elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号