首页> 中文期刊>计算机科学与探索 >混合存储下的MapReduce启发式多表连接优化

混合存储下的MapReduce启发式多表连接优化

     

摘要

对MapReduce下的多表连接查询进行了研究,发现由于MapReduce框架本身的局限性,造成执行效率较低。针对此问题,提出了MapReduce启发式多表连接优化方法(MapReduce based heuristic multi-join opti-mization,MHMO),为不同的连接模式启发式地推荐不同的执行算法。特别的,对于混合连接,首先将其分组为多个简单连接模式,进而定义代价模型确定各分组的最优执行顺序。结合列存储的延迟物化技术,大大提高了MapReduce 下多表连接的执行性能。最后,在数据仓库基准测试数据集TPCH 上进行了实验,验证了 MHMO的有效性。%The MapReduce technology has become one of the key technology for massive data processing. However, the limitation of its computing framework leads to the poor performance in multi-join query analysis tasks. To deal with this problem, this paper proposes an adaptive multi-join optimization method for MapReduce framework, called MHMO (MapReduce based heuristic multi-join optimization). For a given query including multi-join, this paper first constructs the join graph to judge its join pattern, then recommends the“optimal”execution strategy for different patterns. Particularly, for hybrid join, this paper first converts and divides it into a set of simple join patterns, then defines the cost model to choose the execution order between different groups with minimum cost. Integrated with the row-column storage and deferred materialized technology, MHMO can improve the multi-join performance in MapReduce framework significantly. Finally, based on the benchmark dataset TPCH, several experiments are made to testify the effectiveness of MHMO.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号