A Load-Distributed Linpack Implementation for Heterogeneous Clusters

机译：异构集群的负载分布式Linpack实现

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In recent years, heterogeneous HPC systems, whichcombine traditional processors with accelerator cards such as GPUs, have been shown to deliver superior performance and power efficiency. Since different scientific problems pose different demands on the computer architecture, some general purpose supercomputers consist of different types of nodes, where each type is suited best for certain applications. Such clusters with inter-node heterogeneity (different types of nodes) on top of intra-node heterogeneity (different processors inside one node) consist of compute nodes with different compute performances. The standard implementation of the Linpack benchmark, HPL, distributes the workload evenly among all processes and thus cannot exploit the cluster's full potential if the nodes have unequalperformance. This paper presents a new feature of our HPL-GPU implementation which allows a balanced fine-tuned workload distribution among all compute nodes taking into account their individual compute capabilities. We present results on some nodes of different speed-grades on the LOEWE-CSC cluster and demonstrate that our implementation can utilize all nodes of a heterogeneous configuration efficiently showing only about 3% granularity loss.

机译：近年来，已证明将传统处理器与加速器卡（如GPU）结合在一起的异构HPC系统具有出色的性能和能效。由于不同的科学问题对计算机体系结构提出了不同的要求，因此某些通用超级计算机由不同类型的节点组成，其中每种类型的节点最适合某些应用程序。这种在节点内异构性（一个节点内的不同处理器）之上具有节点间异构性（节点的不同类型）的群集由具有不同计算性能的计算节点组成。 Linpack基准测试的标准实施HPL在所有进程之间平均分配工作负载，因此，如果节点的性能不平等，则无法充分利用群集的全部潜力。本文介绍了我们的HPL-GPU实现的新功能，该功能允许在考虑所有计算节点各自的计算能力的情况下，在所有计算节点之间进行均衡的微调工作负载分配。我们在LOEWE-CSC集群上的一些不同速度等级的节点上展示了结果，并证明了我们的实现可以有效利用异构配置的所有节点，仅显示大约3％的粒度损失。

著录项

来源
《2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, 2015 IEEE 12th International Conference on Embedded Software and Systems》|2015年|436-443|共8页
会议地点 New York NY(US)
作者
Rohr David; Lindenstruth Volker;
展开▼
作者单位

Frankfurt Inst. for Adv. Studies, Frankfurt, Germany;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
computer architecture; mainframes; parallel machines; software libraries; HPL-GPU implementation; LOEWE-CSC cluster; Linpack benchmark; accelerator cards; balanced fine-tuned workload distribution; compute nodes; computer architecture; general purpose supercomputers; heterogeneous HPC systems; heterogeneous clusters; internode heterogeneity; intranode heterogeneity; load-distributed Linpack implementation; Benchmark testing; Graphics processing units; Hardware; Niobium; Standards; Supercomputers; GPU; HPC; HPL; HPL-GPU; H;

机译：计算机体系结构;大型机;并行机;软件库; HPL-GPU实现; LOEWE-CSC集群; Linpack基准测试;加速器卡;平衡的精细工作负载分配;计算节点;计算机体系结构;通用超级计算机;异构HPC系统;异构集群节点间异构性;节点内异构性;负载分布Linpack实施;基准测试;图形处理单元;硬件;铌;标准;超级计算机; GPU; HPC; HPL; HPL-GPU; H;
入库时间 2022-08-26 13:53:54

相似文献

外文文献
中文文献
专利

1. Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes [J] . Jo Gangwon, Nah Jeongho, Lee Jun, Parallel and Distributed Systems, IEEE Transactions on . 2015,第7期

机译：使用MPI-OpenCL在多GPU节点的群集上加速LINPACK
2. Multi-GPU DGEMM and High Performance Linpack on Highly Energy-Efficient Clusters [J] . Rohr David, Bach Matthias, Kretz Matthias, Micro, IEEE . 2011,第5期

机译：高效节能集群上的多GPU DGEMM和高性能Linpack
3. Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster [J] . Junichi OHMURA, Takefumi MIYOSHI, Hidetsugu IRIE, IEICE transactions on information and systems . 2011,第12期

机译：Linpack在GPU加速的PC群集上的计算-通信重叠
4. A Load-Distributed Linpack Implementation for Heterogeneous Clusters [C] . Rohr David, Lindenstruth Volker IEEE International Conference on High Performance Computing and Communications . 2015

机译：用于异构簇的负载分布式的LINPACK实现
5. Towards Mitigating Co-incident Peak Power Consumption and Managing Energy Utilization in Heterogeneous Clusters [D] . Rueda, Renan Delvalle. 2018

机译：致力于减轻异构集群中的同时发生的峰值功率消耗和管理能源利用
6. Performance evaluation results of evolutionary clustering algorithm star for clustering heterogeneous datasets [O] . Bryar A. Hassan, Tarik A. Rashid, Seyedali Mirjalili 2021

机译：群体异构数据集的进化聚类算法星的性能评估结果
7. Accelerating Linpack Performance with Mixed Precision Algorithm on CPU+GPGPU Heterogeneous Cluster [O] . Wang Lei, Zhang Yunquan, Zhang Xianyi, 2014

机译：在CpU + GpGpU异构集群上利用混合精度算法加速Linpack性能

A Load-Distributed Linpack Implementation for Heterogeneous Clusters

摘要

著录项

相似文献

相关主题

期刊订阅