首页> 外文会议>International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering >Efficiency, Energy Efficiency and Programming of Accelerated HPC Servers: Highlights of PRACE Studies
【24h】

Efficiency, Energy Efficiency and Programming of Accelerated HPC Servers: Highlights of PRACE Studies

机译:加速HPC服务器的效率,能源效率和编程:PRACE研究的亮点

获取原文

摘要

During the last few years the convergence in architecture for High-Performance Computing systems that took place for over a decade has been replaced by a divergence. The divergence is driven by the quest for performance, cost-performance and in the last few years also energy consumption that during the life-time of a system have come to exceed the HPC system cost in many cases. Mass market, specialized processors, such as the Cell Broadband Engine (CBE) and Graphics Processors, have received particular attention, the latter especially after hardware support for double-precision floating-point arithmetic was introduced about three years ago. The recent support of Error Correcting Code (ECC) for memory and significantly enhanced performance for double-precision arithmetic in the current generation of Graphic Processing Units (GPUs) have further solidified the interest in GPUs for HPC. In order to assess the issues involved in potentially deploying clusters with nodes consisting of commodity microprocessors with some type of specialized processor for enhanced performance or enhanced energy efficiency or both for science and engineering workloads, PRACE, the Partnership for Advanced Computing in Europe, undertook a study that included three types of accelerators, the CBE, GPUs and ClearSpeed, and tools for their programming. The study focused on assessing performance, efficiency, power efficiency for double-precision arithmetic and programmer productivity. Four kernels, matrix multiplication, sparse matrix-vector multiplication, FFT, random number generation were used for the assessment together with High-Performance Linpack (HPL) and a few application codes. We report here on the results from the kernels and HPL for GPU and ClearSpeed accelerated systems. The GPU performed surprisingly significantly better than the CPU on the sparse matrix-vector multiplication on which the ClearSpeed performed surprisingly poorly. For matrix-multiplication, HPL and FFT the ClearSpeed accelerator was by far the most energy efficient device.
机译:在过去几年中,在十年内发生的高性能计算系统的架构融合已被分歧所取代。通过追求性能,成本性能和过去几年的追求驱动的推动也是在系统的生命期间的能源消耗,在许多情况下,在系统的生命期间已经超过了HPC系统成本。大众市场,专业处理器,如小区宽带发动机(CBE)和图形处理器,特别注意,后者尤其是三年前推出了双重精密浮点算术后的硬件支持。最近对存储器的误差校正代码(ECC)的支持以及在目前的图形处理单元(GPU)中对双精度算术的显着增强的性能已经进一步巩固了对HPC的GPU的兴趣。为了评估潜在部署群体的问题,这些群体与商品微处理器组成的节点,具有某种类型的专业处理器,用于增强性能或增强的能效或用于科学和工程工作量,PRACE,欧洲先进计算的合作伙伴关系承担了研究包括三种类型的加速器,CBE,GPU和Clearspeed以及他们编程的工具。该研究侧重于评估性能,效率,功率效率,用于双精度算术和程序员生产力。四个内核,矩阵乘法,稀疏矩阵 - 向量乘法,FFT,随机数生成用于评估,以及高性能LINPACK(HPL)和一些应用程序代码。我们在这里报告了GPU和Clearspeed加速系统的内核和HPL的结果。 GPU在稀疏矩阵矢量乘法上令人惊讶地显着优于CPU,在其上令人惊讶地执行的清除速度。对于矩阵乘法,HPL和FFT清除速度加速器是迄今为止最能节能的装置。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号