首页> 外文会议>International conference on high performance computing for computational science >Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA
【24h】

Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA

机译:具有专有互连TCA的GPU集群上NAS并行CG基准的实现和评估

获取原文

摘要

We have been developing a proprietary interconnect technology called Tightly Coupled Accelerators (TCA) architecture to improve communication latency and bandwidth between accelerators (GPUs) over different nodes. This paper presents a Conjugate Gradient (CG) benchmark implementation using the TCA and results of performance evaluation on the HA-PACS/TCA system, which is a proof-of-concept GPU cluster based on the TCA concept. The implementation is based on the CG benchmark in NAS Parallel Benchmarks, and its parallelization is achieved by a two-dimensional decomposition of matrix data. The TCA utilization improves the communication performance compared with the implementation with MPI/InfiniBand utilization for small size benchmark classes. This study also shows that the CG implementation with the two-dimensional decomposition is more suitable for the TCA utilization than a CG implementation with a one-dimensional decomposition to make use of the interconnect.
机译:我们一直在开发一种专有的互连技术,称为紧密耦合加速器(TCA)架构,以改善不同节点上加速器(GPU)之间的通信延迟和带宽。本文介绍了使用TCA的共轭梯度(CG)基准实施以及HA-PACS / TCA系统的性能评估结果,该系统是基于TCA概念的概念验证GPU集群。该实现基于NAS并行基准测试中的CG基准,并且其并行化是通过矩阵数据的二维分解来实现的。与使用MPI / InfiniBand进行小规模基准测试的实现相比,TCA的使用提高了通信性能。该研究还表明,与使用一维分解的CG实现以利用互连相比,使用二维分解的CG实现更适合于TCA利用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号