Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes

Jo Gangwon; Nah Jeongho; Lee Jun; Kim Jungwon; Lee Jaejin

首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes

【24h】

Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes

机译：使用MPI-OpenCL在多GPU节点的群集上加速LINPACK

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

OpenCL is an open standard to write parallel applications for heterogeneous computing systems. Since its usage is restricted to a single operating system instance, programmers need to use a mix of OpenCL and MPI to program a heterogeneous cluster. In this paper, we introduce an MPI-OpenCL implementation of the LINPACK benchmark for a cluster with multi-GPU nodes. The LINPACK benchmark is one of the most widely used benchmark applications for evaluating high performance computing systems. Our implementation is based on High Performance LINPACK (HPL) and uses the blocked LU decomposition algorithm. We address that optimizations aimed at reducing the overhead of CPUs are necessary to overcome the performance gap between the CPUs and the multiple GPUs. Our LINPACK implementation achieves 93.69 Tflops (46 percent of the theoretical peak) on the target cluster with 49 nodes, each node containing two eight-core CPUs and four GPUs.

机译：OpenCL是一种开放标准，可为异构计算系统编写并行应用程序。由于它的使用仅限于单个操作系统实例，因此程序员需要混合使用OpenCL和MPI来对异构集群进行编程。在本文中，我们介绍了针对具有多GPU节点的群集的LINPACK基准测试的MPI-OpenCL实现。 LINPACK基准测试是用于评估高性能计算系统的最广泛使用的基准测试应用程序之一。我们的实现基于高性能LINPACK（HPL），并使用分块LU分解算法。我们指出，旨在减少CPU开销的优化是克服CPU与多个GPU之间的性能差距所必需的。我们的LINPACK实施在具有49个节点的目标集群上实现了93.69 Tflops（理论峰值的46％），每个节点包含两个八核CPU和四个GPU。

著录项

来源
《Parallel and Distributed Systems, IEEE Transactions on》 |2015年第7期|1814-1825|共12页
作者
Jo Gangwon; Nah Jeongho; Lee Jun; Kim Jungwon; Lee Jaejin;
展开▼
作者单位

, Department of Computer Science and Engineering, Seoul National University, Seoul 151-744, Korea;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Cluster; GPU; OpenCL; heterogeneous computing; high performance LINPACK;

机译：集群;GPU;OpenCL;异构计算;高性能LINPACK;

相似文献

外文文献
中文文献
专利

1. Multi-GPU DGEMM and High Performance Linpack on Highly Energy-Efficient Clusters [J] . Rohr David, Bach Matthias, Kretz Matthias, Micro, IEEE . 2011,第5期

机译：高效节能集群上的多GPU DGEMM和高性能Linpack
2. Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster [J] . Junichi OHMURA, Takefumi MIYOSHI, Hidetsugu IRIE, IEICE transactions on information and systems . 2011,第12期

机译：Linpack在GPU加速的PC群集上的计算-通信重叠
3. Computation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster [J] . Junichi OHMURA, Takefumi MIYOSHI, Hidetsugu IRIE, IEICE Transactions on Information and Systems . 2011,第12期

机译：Linpack在GPU加速的PC群集上的计算-通信重叠
4. Accelerating Linpack Performance with Mixed Precision Algorithm on CPU+GPGPU Heterogeneous Cluster [C] . Lei Wang, Yunquan Zhang, Xianyi Zhang, 10th IEEE International Conference on Computer and Information Technology . 2010

机译：在CPU + GPGPU异构集群上使用混合精度算法提高Linpack性能
5. Accelerating MPI collective communications through hierarchical algorithms with flexible inter-node communication and imbalance awareness. [D] . Parsons, Benjamin S. 2015

机译：通过具有灵活的节点间通信和不平衡意识的分层算法来加速MPI集体通信。
6. A multi-GPU accelerated virtual-reality interaction simulation framework [O] . Xuqiang Shao, Weifeng Xu, Lina Lin, 2012

机译：多GPU加速的虚拟现实交互仿真框架
7. Accelerating Linpack Performance with Mixed Precision Algorithm on CPU+GPGPU Heterogeneous Cluster [O] . Wang Lei, Zhang Yunquan, Zhang Xianyi, 2014

机译：在CpU + GpGpU异构集群上利用混合精度算法加速Linpack性能

Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes

摘要

著录项

相似文献

相关主题

期刊订阅