Multi-GPU Parallelization of the NAS Multi-Zone Parallel Benchmarks

Gonzalez Marc; Morancho Enric

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Multi-GPU Parallelization of the NAS Multi-Zone Parallel Benchmarks

【24h】

Multi-GPU Parallelization of the NAS Multi-Zone Parallel Benchmarks

机译：NAS多区并行基准的多GPU并行化

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

GPU-based computing systems have become a widely accepted solution for the high-performance-computing (HPC) domain. GPUs have shown highly competitive performance-per-watt ratios and can exploit an astonishing level of parallelism. However, exploiting the peak performance of such devices is a challenge, mainly due to the combination of two essential aspects of multi-GPU execution. On one hand, the workload should be distributed evenly among the GPUs. On the other hand, communications between GPU devices are costly and should be minimized. Therefore, a trade-of between work-distribution schemes and communication overheads will condition the overall performance of parallel applications run on multi-GPU systems. In this article we present a multi-GPU implementation of NAS Multi-Zone Parallel Benchmarks (which execution alternate communication and computational phases). We propose several work-distribution strategies that try to evenly distribute the workload among the GPUs. Our evaluations show that performance is highly sensitive to this distribution strategy, as the the communication phases of the applications are heavily affected by the work-distribution schemes applied in computational phases. In particular, we consider Static, Dynamic, and Guided schedulers to find a trade-off between both phases to maximize the overall performance. In addition, we compare those schedulers with an optimal scheduler computed offline using IBM CPLEX. On an evaluation environment composed of 2 x IBM Power9 8335-GTH and 4 x GPU NVIDIA V100 (Volta), our multi-GPU parallelization outperforms single-GPU execution from 1.48x to 1.86x (2 GPUs) and from 1.75x to 3.54x (4 GPUs). This article analyses these improvements in terms of the relationship between the computational and communication phases of the applications as the number of GPUs is increased. We prove that Guided schedulers perform at similar level as optimal schedulers.

机译：基于GPU的计算系统已成为高性能计算（HPC）域的广泛接受的解决方案。 GPU已经显示出高竞争性的性能，可以利用令人惊讶的平行水平。然而，利用这种设备的峰值性能是一项挑战，主要是由于多GPU执行的两个基本方面的组合。一方面，工作负载应在GPU之间均匀分布。另一方面，GPU设备之间的通信成本高昂，应最小化。因此，工作分配方案和通信开销之间的贸易将根据多GPU系统运行的并行应用程序的整体性能。在本文中，我们介绍了NAS多区并行基准的多GPU实现（执行备用通信和计算阶段）。我们提出了几项工作分配策略，试图均匀地分配GPU之间的工作量。我们的评估表明，由于该应用的通信阶段受到计算阶段的工作分配方案的影响，性能对该分配策略非常敏感。特别是，我们考虑静态，动态和导向调度程序，以在两个阶段之间找到权衡，以最大限度地提高整体性能。此外，我们将使用IBM CPLEX进行最佳调度程序使用最佳调度程序进行比较。在由2 x IBM Power9 8335-GTH和4 x GPU NVIDIA V100（Volta）组成的评估环境中，我们的多GPU并行化优于148倍至1.86倍（2 GPU）的单GPU执行，1.75倍至3.54倍（4个GPU）。本文根据GPU的数量增加，分析了应用程序的计算和通信阶段之间的关系方面的改进。我们证明导向调度程序以相似的水平作为最佳调度员执行。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2021年第1期|229-241|共13页
作者
Gonzalez Marc; Morancho Enric;
展开▼
作者单位

Univ Politecn Cataluna Comp Architecture Dept Barcelona Tech Barcelona 08034 Spain;

Univ Politecn Cataluna Comp Architecture Dept Barcelona Tech Barcelona 08034 Spain;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Benchmark testing; Graphics processing units; Parallel processing; Dynamic scheduling; Performance evaluation; Optimal scheduling; Load management; Multi-GPU parallelization; load balancing; static; dynamic; guided schedulings;

机译：基准测试;图形处理单元;并行处理;动态调度;性能评估;最佳调度;负载管理;多GPU并行化;负载平衡;静态;动态;动态的调度;

相似文献

外文文献
中文文献
专利

1. Performance characteristics of the multi-zone NAS parallel benchmarks [J] . Jin HQ, Van der Wijngaart RF Journal of Parallel and Distributed Computing . 2006,第5期

机译：多区域NAS并行基准测试的性能特征
2. The NAS Parallel Benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures [J] . Junior Loeff, Dalvan Griebler, Gabriele Mencagli, Future generation computer systems . 2021,第Deca期

机译：评估共享内存架构上的C ++并行编程框架的NAS并行基准
3. Parallelization of NAS benchmarks for shared memory multiprocessors [J] . Abdul Waheed, Jerry Yan Future generation computer systems . 1999,第3期

机译：共享内存多处理器的NAS基准测试的并行化
4. Performance characteristics of the multi-zone NAS parallel benchmarks [C] . Jin, H., Van der Wijngaart, . 2004

机译：多区域NAS并行基准测试的性能特征
5. Impact of Implementing a Dedicated Outdoor Air System in Parallel with a Multi-Zone Variable-Air Volume System on Energy Consumption, Thermal Comfort, and Life Cycle Cost [D] . Alghamdi, Khaled. 2019

机译：在能耗，热舒适度和生命周期成本上平行实施专用室外空气系统的影响。
6. Multi-GPU Based Parallel Design of the Ant Colony Optimization Algorithm for Endmember Extraction from Hyperspectral Images [O] . Jianwei Gao, Yi Sun, Bing Zhang, 2019

机译：基于多GPU的蚁群优化算法从高光谱图像中提取末端成员的并行设计
7. Performance Characterization of Tachyon Supercomputer using Hybrid Multi-zone NAS Parallel Benchmarks [O] . Nam-Kyu Park, Yoon-Su Jeong, Hong-Suk Yi 2010

机译：使用混合多区NAS并联基准的Tachyon超级计算机性能表征
8. NAS Parallel Benchmarks, Multi-Zone Versions [R] . vanderWijngaart, Rob F., Haopiang, Jin 2003

机译：Nas并行基准测试，多区域版本

Multi-GPU Parallelization of the NAS Multi-Zone Parallel Benchmarks

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅