首页> 外文会议>International Conference on Cloud Computing and Big Data >Performance Evaluation for Distributed Join Based on MapReduce
【24h】

Performance Evaluation for Distributed Join Based on MapReduce

机译:基于MapReduce的分布式联接性能评估

获取原文

摘要

Inner-Join is a fundamental and frequent operation in large-scale data analysis. MapReduce is the most widely available framework in large-scale data analysis. A variety of inner-join algorithms are put forward to run on the MapReduce environment. Usually, those algorithms are designed for specific scenarios, but inner-join could present very different performance when data volume, reference ratio, data skew rate, and running environments et al are varied. This paper summarized and implemented those well-known join algorithms in a uniform MapReduce environment. Considering the number of tables, broadcast cost, data skew, join rate and related factors, we designed and conducted a large number of experiments to compare the time cost of those join algorithms. According to the experimental results, we analyzed and summarized the performance and applicability of those algorithms in different scenarios, which would be a reference of performance improvement for large-scale data analysis under different circumstances.
机译:Inner-Join是大规模数据分析中的一项基本且频繁的操作。 MapReduce是大规模数据分析中使用最广泛的框架。提出了多种内部联接算法以在MapReduce环境中运行。通常,这些算法是针对特定场景设计的,但是当数据量,参考比率,数据偏斜率和运行环境等发生变化时,内部联接可能会表现出截然不同的性能。本文在统一的MapReduce环境中总结并实现了那些著名的联接算法。考虑到表的数量,广播成本,数据偏斜,连接率和相关因素,我们设计并进行了大量实验以比较这些连接算法的时间成本。根据实验结果,我们分析并总结了这些算法在不同场景下的性能和适用性,为不同环境下大规模数据分析的性能改进提供了参考。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号