...
首页> 外文期刊>Journal of supercomputing >Characterizing and benchmarking stand-alone Hadoop MapReduce on modern HPC clusters
【24h】

Characterizing and benchmarking stand-alone Hadoop MapReduce on modern HPC clusters

机译:在现代HPC集群上表征独立的Hadoop MapReduce并对其进行基准测试

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

With the emergence of high-performance data analytics, the Hadoop platform is being increasingly used to process data stored on high-performance computing clusters. While there is immense scope for improving the performance of Hadoop MapReduce (including the network-intensive shuffle phase) over these modern clusters, that are equipped with high-speed interconnects such as Infini-Band and 10/40 GigE, and storage systems such as SSDs and Lustre, it is essential to study the MapReduce component in an isolated manner. In this paper, we study popular MapReduce workloads, obtained from well-accepted, comprehensive benchmark suites, to identify common shuffle data distribution patterns. We determine different environmental and workload-specific factors that affect the performance of the MapReduce job. Based on these characterization studies, we propose a micro-benchmark suite that can be used to evaluate the performance of stand-alone Hadoop MapReduce, and demonstrate its ease-of-use with different networks/protocols, Hadoop distributions, and storage architectures. Performance evaluations with our proposed micro-benchmarks show that stand-alone Hadoop MapReduce over IPoIB performs better than 10GigE by about 13-15%, and the RDMA-enhanced hybrid MapReduce design can achieve up to 43% performance improvement over default Hadoop MapReduce over IPoIB, in both shared-nothing and shared storage architectures.
机译:随着高性能数据分析的出现,Hadoop平台正越来越多地用于处理存储在高性能计算集群上的数据。虽然在这些现代集群上有巨大的提升空间来改善Hadoop MapReduce(包括网络密集型洗牌阶段)的性能,但这些集群配备了诸如Infini-Band和10/40 GigE之类的高速互连以及诸如SSD和Lustre,以孤立的方式研究MapReduce组件至关重要。在本文中,我们研究了从公认的,全面的基准测试套件中获得的流行MapReduce工作负载,以识别常见的随机数据分发模式。我们确定影响MapReduce作业性能的不同环境和特定于工作负载的因素。基于这些特征研究,我们提出了一种微基准套件,可用于评估独立Hadoop MapReduce的性能,并展示其在不同网络/协议,Hadoop发行版和存储体系结构中的易用性。通过我们提出的微基准进行的性能评估表明,独立的基于IPoIB的Hadoop MapReduce的性能比10GigE好约13-15%,并且与基于IPoIB的默认Hadoop MapReduce相比,RDMA增强的混合MapReduce设计可以将性能提高多达43%。 ,在无共享和共享存储架构中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号