首页> 外文期刊>Performance Evaluation >A time-energy performance analysis of Map Reduce on heterogeneous systems with GPUs
【24h】

A time-energy performance analysis of Map Reduce on heterogeneous systems with GPUs

机译:具有GPU的异构系统上Map Reduce的时间能量性能分析

获取原文
获取原文并翻译 | 示例
           

摘要

Motivated by the explosion of Big Data analytics, performance improvements in low-power (wimpy) systems and the increasing energy efficiency of CPUs, this paper presents a time-energy performance analysis of MapReduce on heterogeneous systems with GPUs. We evaluate the time and energy performance of three MapReduce applications with diverse resource demands on a Hadoop-CUDA framework. As executing these applications on heterogeneous systems with GPUs is challenging, we introduce a novel lazy processing technique which requires no modifications to the underlying Hadoop framework. To analyze the impact of heterogeneity, we compare the heterogeneous CPU+GPU with the homogeneous CPU-only execution across three systems with diverse characteristics, (i) a traditional high-performance (brawny) Intel i7 system hosting a discrete 640-core Nvidia GPU of the latest Maxwell generation, (ii) a wimpy platform consisting of a quad-core ARM Cortex-A9 hosting the same discrete Maxwell CPU, and (iii) a wimpy platform integrating four ARM Cortex-A15 cores and 192 Nvidia Kepler CPU cores on the same chip. These systems encompass both intra-node heterogeneity with discrete CPUs and intra-chip heterogeneity with integrated CPUs. Our measurement-based performance analysis highlights the following results. For compute-intensive workloads, the brawny heterogeneous system achieves speedups of up to 2.3 and reduces the energy usage by almost half compared to the brawny homogeneous system. As expected, for applications where data transfers dominate the execution time, heterogeneity exhibits worse time-energy performance compared to homogeneous systems. For such applications, the heterogeneous wimpy A9 system with discrete GPU uses around 14 times the energy of homogeneous A9 system due to both system resource imbalances and high power overhead of the discrete CPU. However, comparing among heterogeneous systems, the wimpy A15 with integrated CPU uses the lowest energy across all workloads. This allows us to establish an execution time equivalence ratio between a single brawny node and multiple wimpy nodes. Based on this equivalence ratio, the wimpy nodes exhibit energy savings of two-thirds while maintaining the same execution time. This result advocates the potential usage of heterogeneous wimpy systems with integrated CPUs for Big Data analytics. (C) 2015 Elsevier B.V. All rights reserved.
机译:受到大数据分析的爆炸式增长,低功耗(w弱)系统的性能改进以及CPU能源效率不断提高的推动,本文提出了在具有GPU的异构系统上进行MapReduce的时间能量性能分析。我们在Hadoop-CUDA框架上评估具有不同资源需求的三个MapReduce应用程序的时间和能源性能。由于在具有GPU的异构系统上执行这些应用程序具有挑战性,因此我们引入了一种新颖的惰性处理技术,该技术不需要对基础Hadoop框架进行任何修改。为了分析异构性的影响,我们将异构CPU + GPU与具有不同特征的三个系统上的同类CPU仅执行性能进行了比较,(i)传统的高性能(暗褐色)Intel i7系统托管了离散的640核Nvidia GPU (ii)一个由四核ARM Cortex-A9托管相同的离散Maxwell CPU组成的wimpy平台,以及(iii)集成四个ARM Cortex-A15内核和192个Nvidia Kepler CPU内核的wimpy平台相同的芯片。这些系统既包含具有离散CPU的节点内异构性,也包含具有集成CPU的芯片内异构性。我们基于测量的性能分析重点介绍了以下结果。对于计算密集型工作负载,强壮的异构系统实现了高达2.3的加速,并且与壮壮的同类系统相比,能耗降低了近一半。不出所料,对于数据传输支配执行时间的应用程序,与同类系统相比,异构性表现出更差的时间能量性能。对于此类应用,由于系统资源不平衡和离散CPU的高功耗,具有离散GPU的异构wimpy A9系统消耗的能量是同类A9系统的14倍左右。但是,在异构系统之间进行比较,带有集成CPU的wimpy A15在所有工作负载中使用的能耗最低。这使我们能够在单个强壮节点和多个w弱节点之间建立执行时间当量比。基于此当量比,wimpy节点在保持相同执行时间的同时,节省了三分之二的能量。该结果提倡将具有集成CPU的异构弱系统用于大数据分析的潜在用途。 (C)2015 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号