【24h】

A Load-Distributed Linpack Implementation for Heterogeneous Clusters

机译:异构集群的负载分布式Linpack实现

获取原文
获取原文并翻译 | 示例

摘要

In recent years, heterogeneous HPC systems, whichcombine traditional processors with accelerator cards such as GPUs, have been shown to deliver superior performance and power efficiency. Since different scientific problems pose different demands on the computer architecture, some general purpose supercomputers consist of different types of nodes, where each type is suited best for certain applications. Such clusters with inter-node heterogeneity (different types of nodes) on top of intra-node heterogeneity (different processors inside one node) consist of compute nodes with different compute performances. The standard implementation of the Linpack benchmark, HPL, distributes the workload evenly among all processes and thus cannot exploit the cluster's full potential if the nodes have unequalperformance. This paper presents a new feature of our HPL-GPU implementation which allows a balanced fine-tuned workload distribution among all compute nodes taking into account their individual compute capabilities. We present results on some nodes of different speed-grades on the LOEWE-CSC cluster and demonstrate that our implementation can utilize all nodes of a heterogeneous configuration efficiently showing only about 3% granularity loss.
机译:近年来,已证明将传统处理器与加速器卡(如GPU)结合在一起的异构HPC系统具有出色的性能和能效。由于不同的科学问题对计算机体系结构提出了不同的要求,因此某些通用超级计算机由不同类型的节点组成,其中每种类型的节点最适合某些应用程序。这种在节点内异构性(一个节点内的不同处理器)之上具有节点间异构性(节点的不同类型)的群集由具有不同计算性能的计算节点组成。 Linpack基准测试的标准实施HPL在所有进程之间平均分配工作负载,因此,如果节点的性能不平等,则无法充分利用群集的全部潜力。本文介绍了我们的HPL-GPU实现的新功能,该功能允许在考虑所有计算节点各自的计算能力的情况下,在所有计算节点之间进行均衡的微调工作负载分配。我们在LOEWE-CSC集群上的一些不同速度等级的节点上展示了结果,并证明了我们的实现可以有效利用异构配置的所有节点,仅显示大约3%的粒度损失。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号