A Scalable Framework for Heterogeneous GPU-Based Clusters

机译：基于异构GPU的集群的可扩展框架

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

GPU-based heterogeneous clusters continue to draw attention from vendors and HPC users due to their high energy efficiency and much improved single-node computational performance, however, there is little parallel software available that can utilize all CPU cores and all CPUs on the heterogeneous system efficiently. On a heterogeneous cluster, the performance of a GPU (or a compute node) increases in a much faster rate than the performance of the PCI-Express connection (or the interconnection network) such that communication eventually becomes the bottleneck of the entire system. To overcome the bottleneck, we developed a multilevel partitioning and distribution method that guarantees a near-optimal communication volume. We have also extended heterogeneous tile algorithms to work on distributed-memory GPU clusters. Our main idea is to execute a serial program and generate hybrid-size tasks, and follow a dataflow programming model to fire the tasks on different compute nodes. We then devised a distributed dynamic scheduling runtime system to schedule tasks, and transfer data between hybrid GPU-GPU compute nodes transparently. The runtime system employs a novel distributed task-assignment protocol to solve data dependencies between tasks without coordination between processing units. The runtime system on each node consists of a number of CPU compute threads, a number of GPU compute threads, a task generation thread, an MPI communication thread, and a GUDA communication thread. By overlapping computation and communication through dynamic scheduling, we are able to attain a high performance of 75 TFlops for Cholesky factorization on the heterogeneous Keeneland system [25] using 100 nodes, each with twelve CPU cores and three GPUs. Moreover, our framework is able to attain high performance on distributed-memory clusters without CPUs, and shared-system multiGPUs.

机译：基于GPU的异构集群由于其高能效和大大提高的单节点计算性能而继续引起供应商和HPC用户的关注，但是，几乎没有可用的并行软件可利用异构系统上的所有CPU内核和所有CPU。有效率的。在异构群集上，GPU（或计算节点）的性能以比PCI-Express连接（或互连网络）的性能快得多的速率增加，从而通信最终成为整个系统的瓶颈。为克服瓶颈，我们开发了一种多级分区和分发方法，可确保近乎最佳的通信量。我们还扩展了异构切片算法，以在分布式内存GPU群集上工作。我们的主要思想是执行一个串行程序并生成混合大小的任务，并遵循数据流编程模型在不同的计算节点上触发任务。然后，我们设计了一个分布式动态调度运行时系统来调度任务，并在混合GPU-GPU计算节点之间透明地传输数据。运行时系统采用新颖的分布式任务分配协议来解决任务之间的数据依赖性，而无需处理单元之间的协调。每个节点上的运行时系统由多个CPU计算线程，多个GPU计算线程，任务生成线程，MPI通信线程和GUDA通信线程组成。通过重叠的计算和动态调度通信，我们能够在使用100个节点（每个节点具有十二个CPU内核和三个GPU）的异构Keeneland系统[25]上实现75 TFlops的Cholesky分解的高性能。此外，我们的框架能够在没有CPU和共享系统multiGPU的分布式内存集群上实现高性能。

著录项

来源
《Proceedings of the 24th ACM symposium on parallelism in algorithms and architectures》|2012年|91-100|共10页
会议地点 Pittsburgh PA(US)
作者
Fengguang Song; Jack Dongarra;
展开▼
作者单位

Innovative Computing Laboratory University of Tennessee Knoxville, TN, USA;

University of Tennessee, USA Oak Ridge National Laboratory, USA University of Manchester, UK;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Distributed runtime; manycore scheduling; hybrid GPU-GPU architectures; heterogeneous rrnclusters; linear algebra;

机译：分布式运行时；多核调度；混合GPU-GPU架构;异质结簇线性代数;

相似文献

外文文献
中文文献
专利

1. A scalable and adaptable allocation framework for heterogeneous resources in a large cluster environment [J] . Filelis-Papadopoulos Christos, Xiong Huanhuan, Morrison John Concurrency and computation: practice and experience . 2021,第14期

机译：大集群环境中的异构资源可扩展和适应性的分配框架
2. vcluster: a framework for auto scalable virtual cluster system in heterogeneous clouds [J] . Seo-Young Noh, Steven C. Timm, Haengjin Jang Cluster computing . 2014,第3期

机译：vcluster：异构云中自动可扩展虚拟集群系统的框架
3. A GPU-Based Programming Framework for Highly-Scalable Multi-Agent Traffic Simulations [J] . Yoshihito Sano, Naoki Fukuta Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2014,第4a105期

机译：基于GPU的编程框架，用于高度可伸缩的多智能体交通仿真
4. A Scalable Framework for Heterogeneous GPU-Based Clusters [C] . Fengguang Song, Jack Dongarra ACM symposium on parallelism in algorithms and architectures . 2012

机译：一种可扩展的基于GPU集群的可扩展框架
5. Scalable frameworks and algorithms for cluster ensembles and clustering data streams. [D] . Hore, Prodip. 2007

机译：用于集群集成和集群数据流的可扩展框架和算法。
6. Mining the Mind Research Network: A Novel Framework for Exploring Large Scale Heterogeneous Translational Neuroscience Research Data Sources [O] . Henry J. Bockholt, Mark Scully, William Courtney, 2009

机译：挖掘思维研究网络：探索大规模异构异构翻译神经科学研究数据源的新型框架
7. A scalable framework for heterogeneous GPU-based clusters [O] . Fengguang Song, Jack Dongarra 2012

机译：基于GpU的异构集群的可扩展框架

A Scalable Framework for Heterogeneous GPU-Based Clusters

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅