首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes
【24h】

Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes

机译:使用MPI-OpenCL在多GPU节点的群集上加速LINPACK

获取原文
获取原文并翻译 | 示例
       

摘要

OpenCL is an open standard to write parallel applications for heterogeneous computing systems. Since its usage is restricted to a single operating system instance, programmers need to use a mix of OpenCL and MPI to program a heterogeneous cluster. In this paper, we introduce an MPI-OpenCL implementation of the LINPACK benchmark for a cluster with multi-GPU nodes. The LINPACK benchmark is one of the most widely used benchmark applications for evaluating high performance computing systems. Our implementation is based on High Performance LINPACK (HPL) and uses the blocked LU decomposition algorithm. We address that optimizations aimed at reducing the overhead of CPUs are necessary to overcome the performance gap between the CPUs and the multiple GPUs. Our LINPACK implementation achieves 93.69 Tflops (46 percent of the theoretical peak) on the target cluster with 49 nodes, each node containing two eight-core CPUs and four GPUs.
机译:OpenCL是一种开放标准,可为异构计算系统编写并行应用程序。由于它的使用仅限于单个操作系统实例,因此程序员需要混合使用OpenCL和MPI来对异构集群进行编程。在本文中,我们介绍了针对具有多GPU节点的群集的LINPACK基准测试的MPI-OpenCL实现。 LINPACK基准测试是用于评估高性能计算系统的最广泛使用的基准测试应用程序之一。我们的实现基于高性能LINPACK(HPL),并使用分块LU分解算法。我们指出,旨在减少CPU开销的优化是克服CPU与多个GPU之间的性能差距所必需的。我们的LINPACK实施在具有49个节点的目标集群上实现了93.69 Tflops(理论峰值的46%),每个节点包含两个八核CPU和四个GPU。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号