首页> 外文会议>IEEE International Conference on Big Data Computing Service and Applications >Power Efficient MapReduce Workload Acceleration Using Integrated-GPU
【24h】

Power Efficient MapReduce Workload Acceleration Using Integrated-GPU

机译:使用Integrated-GPU功率高效MapReduce工作负载加速

获取原文

摘要

With the pervasiveness of MapReduce - one of the most prominent programming models for data parallelism in Apache Hadoop-, many researchers and developers have spent tremendous effort attempting to boost the computational speed and energy efficiency of MapReduce-based big data processing. However, the scalable and fault-tolerant nature of MapReduce introduces additional costs in disk IO and data transfer, caused by streaming intermediate outputs to disk. In light of these issues, many interesting research projects have been initiated with the goal of improving the compute speed and power efficiency of compute-intensive cloud computing workloads, several with the addition of discrete GPUs. In this work, we present a modified MapReduce approach focused on the iterative clustering algorithms in the Apache Mahout machine learning library that leverage the acceleration potential of the Intel integrated GPU in a multi-node cluster environment. The accelerated framework shows varying levels of speed-up (?45x for Map tasks-only, ?4.37x for the entire K-means clustering) as evaluated using the HiBench benchmark suite. Based on various experiments and in-depth analysis, we find that utilizing the integrated GPU via OpenCL offers significant performance and power efficiency gains over the original CPU based approach. Further analysis is also done to understand the correlations between compute, IO and power efficiency. As such, our results show that embracing the integrated GPU in the Hadoop MapReduce framework represents a promising advance in adding cost and energy efficient compute parallelism to a data parallel multinode environment.
机译:凭借Mapreduce的普及性 - Apache Hadoop中的数据并行性最突行的模型之一,许多研究人员和开发人员都花费了巨大的努力,试图提高基于MapReduce的大数据处理的计算速度和能源效率。但是,MapReauce的可扩展和容错性质在磁盘IO和数据传输中引入了额外的成本,由将中间输出传输到磁盘引起。根据这些问题,许多有趣的研究项目已经开始提高计算密集型云计算工作负载的计算速度和功率效率,其中几个是添加离散GPU的几个。在这项工作中,我们介绍了一个修改的MapReduce方法,专注于Apache Mahout机器学习库中的迭代聚类算法,它利用了在多节点群集环境中利用英特尔集成GPU的加速电位。加速框架显示了不同级别的加速(映射用于映射任务的45倍,对于使用Hibench基准套件的评估为4.37倍)。基于各种实验和深入分析,我们发现,通过OpenCL利用集成的GPU提供了基于CPU的原始CPU方法的显着性能和功率效率。还进行了进一步分析以了解计算,IO和功率效率之间的相关性。因此,我们的结果表明,在Hadoop MapReduce框架中拥有集成的GPU代表了向数据并行多边形环境增加成本和节能计算并行性的有希望的进展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号