【24h】

Cost-Based Join Algorithm Selection in Hadoop

机译:Hadoop中基于成本的联接算法选择

获取原文

摘要

In recent years, MapReduce has become a popular computing framework for big data analysis. Join is a major query type for data analysis and various algorithms have been designed to process join queries on top of Hadoop. Since the efficiency of different algorithms differs on the join tasks on hand, to achieve a good performance, users need to select an appropriate algorithm and use the algorithm with a proper configuration, which is rather difficult for many end users. This paper proposes a cost model to estimate the cost of four popular join algorithms. Based on the cost model, the system may automatically choose the join algorithm with the least cost, and then give the reasonable configuration values for the chosen algorithm. Experimental results with the TPC-H benchmark verify that the proposed method can correctly choose the best join algorithm, and the chosen algorithm can achieve a speedup of around 1.25 times over the default join algorithm.
机译:近年来,MapReduce已成为流行的大数据分析计算框架。联接是用于数据分析的一种主要查询类型,并且已经设计了各种算法来在Hadoop之上处理联接查询。由于各种算法的效率在手头的连接任务上各不相同,因此要获得良好的性能,用户需要选择适当的算法并以适当的配置使用该算法,这对许多最终用户而言都是相当困难的。本文提出了一种成本模型来估计四种流行的联接算法的成本。基于成本模型,系统可以自动选择成本最低的联接算法,然后为所选算法提供合理的配置值。以TPC-H为基准的实验结果证明,该方法可以正确选择最佳连接算法,并且所选择的算法可以比默认连接算法提高1.25倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号