首页> 外文会议>International conference on algorithms and architectures for parallel processing >Equi-join for Multiple Datasets Based on Time Cost Evaluation Model
【24h】

Equi-join for Multiple Datasets Based on Time Cost Evaluation Model

机译:基于时间成本评估模型的多个数据集的等值联接

获取原文

摘要

MapReduce is an important programming model for processing big data with a parallel, distributed algorithm on a cluster. In big data analytic application, equi-join is an important operation. However, it is inefficient to perform equi-join operations in MapReduce when multiple datasets are involved in the join. In this paper, a time cost evaluation model is extended for an equi-join by considering the time cost of calculation. In addition, the sub-joins in an equi-join are classified into star pattern sub-joins on single attribute and chain pattern sub-joins. Based on the extended model, optimization methods are presented and an equi-join plan with lower time cost is chosen for the equi-join. The optimization methods include: the star pattern sub-joins on one attribute are first processed; next, a chain pattern sub-join with minimal scale of intermediate results (i.e. the number of tuples in intermediate results) is processed; at last, a chain pattern sub-join is decomposed into several MapReduce jobs or single MapReduce job by dynamic programming to obtain an optimal scheme for the chain pattern sub-join. We conducted extensive experiments, and the results show that our method is more efficient than those methods such as MDMJ, Hive and Pig.
机译:MapReduce是一种重要的编程模型,用于在集群上使用并行,分布式算法来处理大数据。在大数据分析应用中,等联接是一项重要的操作。但是,当多个数据集包含在联接中时,在MapReduce中执行等联接操作效率很低。在本文中,考虑了计算的时间成本,对等联接的时间成本评估模型进行了扩展。另外,等联接中的子联接被分为单个属性上的星型子联接和链型子联接。在扩展模型的基础上,提出了优化方法,并选择了时间成本较低的均等方案。优化方法包括:首先处理一个属性上的星形子连接;接下来,处理具有最小中间结果规模(即中间结果中的元组数)的链模式子连接;最后,通过动态编程将链模式子连接分解为多个MapReduce作业或单个MapReduce作业,以获得链模式子连接的最佳方案。我们进行了广泛的实验,结果表明我们的方法比MDMJ,Hive和Pig等方法更有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号