...
首页> 外文期刊>MATEC Web of Conferences >The Optimization of Cost-Model for Join Operator on Spark SQL Platform
【24h】

The Optimization of Cost-Model for Join Operator on Spark SQL Platform

机译:Spark SQL平台上Join运算符的成本模型优化

获取原文
           

摘要

Spark needs to use lots of memory resources, network resources and disk I/O resources when Spark SQL execute Join operation. The Join operation will greatly affect the performance of Spark SQL. How to improve the Join operation performance become an urgent problem. Spark SQL use Catalyst as query optimizer in the latest release. Catalyst query optimizer both implement the rule-based optimize strategy (RBO) and cost-based optimize strategy (CBO). There are some problems with the Catalyst CBO module. In the first place, the characteristic of In-memory computing in Spark was not fully considered. In the second place, the cost estimation of network transfer and disk I/O is insufficient. To solve these problems and improve the performance of Spark SQL. In this study, we proposed a cost estimation model for Join operator which take the cost from four aspects: time complexity, space complexity, network transfer and disk I/O. Then, the most cost-efficiency plan could be selected by using hierarchical analysis method from the equivalence physical plans which generated by Spark SQL. The experimental results show that the total amount of network transmission is reduced and the usage of processor is increased. Thus the performance of Spark SQL has improved.
机译:Spark SQL执行Join操作时,Spark需要使用大量内存资源,网络资源和磁盘I / O资源。 Join操作将极大地影响Spark SQL的性能。如何提高Join的运行性能成为当务之急。在最新版本中,Spark SQL使用Catalyst作为查询优化器。 Catalyst查询优化器都实现了基于规则的优化策略(RBO)和基于成本的优化策略(CBO)。 Catalyst CBO模块存在一些问题。首先,没有充分考虑Spark中的内存计算特性。其次,网络传输和磁盘I / O的成本估算不足。为了解决这些问题并提高Spark SQL的性能。在这项研究中,我们提出了一种用于Join运算符的成本估算模型,该模型从时间复杂度,空间复杂度,网络传输和磁盘I / O四个方面考虑了成本。然后,可以使用分层分析方法从Spark SQL生成的等效物理计划中选择成本效益最高的计划。实验结果表明,减少了网络传输的总量,增加了处理器的使用率。因此,Spark SQL的性能得到了改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号