首页> 外文会议>2010 IEEE International Conference on Cluster Computing >Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing
【24h】

Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing

机译:PB级异构CPU / GPU计算的自适应优化

获取原文
获取外文期刊封面目录资料

摘要

In this paper, we describe our experiment developing an implementation of the Linpack benchmark for TianHe-1, a petascale CPU/GPU supercomputer system, the largest GPU-accelerated system ever attempted before. An adaptive optimization framework is presented to balance the workload distribution across the GPUs and CPUs with the negligible runtime overhead, resulting in the better performance than the static or the training partitioning methods. The CPU-GPU communication overhead is effectively hidden by a software pipelining technique, which is particularly useful for large memory-bound applications. Combined with other traditional optimizations, the Linpack we optimized using the adaptive optimization framework achieved 196.7 GFLOPS on a single compute element of TianHe-1. This result is 70.1% of the peak compute capability and 3.3 times faster than the result using the vendorȁ9;s library. On the full configuration of TianHe-1 our optimizations resulted in a Linpack performance of 0.563PFLOPS, which made TianHe-1 the 5th fastest supercomputer on the Top500 list released in November 2009.
机译:在本文中,我们描述了我们的实验,该实验开发了针对Tianhe-1的Linpack基准的实现,该系统是Petascale CPU / GPU超级计算机系统,这是迄今为止尝试过的最大的GPU加速系统。提出了一种自适应优化框架,以平衡GPU和CPU上的工作负载分配与可忽略的运行时开销,从而获得比静态或训练分区方法更好的性能。通过软件流水线技术可以有效地隐藏CPU-GPU的通信开销,这对于大型内存绑定应用程序特别有用。结合其他传统优化,我们使用自适应优化框架进行优化的Linpack在TianHe-1的单个计算元素上实现了196.7 GFLOPS。该结果是峰值计算能力的70.1%,比使用供应商的9库的结果快3.3倍。在TianHe-1的完整配置上,我们的优化使Linpack的性能达到0.563PFLOPS,这使TianHe-1成为2009年11月发布的Top500排行榜中排名第五的超级计算机。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号