首页> 外文会议>International Conference on High Performance Computing for Computational Science >Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA
【24h】

Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA

机译:与专有互连TCA的GPU簇中NAS并行CG基准的实施与评估

获取原文

摘要

We have been developing a proprietary interconnect technology called Tightly Coupled Accelerators (TCA) architecture to improve communication latency and bandwidth between accelerators (GPUs) over different nodes. This paper presents a Conjugate Gradient (CG) benchmark implementation using the TCA and results of performance evaluation on the HA-PACS/TCA system, which is a proof-of-concept GPU cluster based on the TCA concept. The implementation is based on the CG benchmark in NAS Parallel Benchmarks, and its parallelization is achieved by a two-dimensional decomposition of matrix data. The TCA utilization improves the communication performance compared with the implementation with MPI/InfiniBand utilization for small size benchmark classes. This study also shows that the CG implementation with the two-dimensional decomposition is more suitable for the TCA utilization than a CG implementation with a one-dimensional decomposition to make use of the interconnect.
机译:我们一直在开发一个名为紧密耦合的加速器(TCA)架构的专有互连技术,以改善在不同节点上的加速器(GPU)之间的通信延迟和带宽。本文介绍了使用TCA的共轭梯度(CG)基准实现,并在HA-PACS / TCA系统上进行性能评估结果,这是基于TCA概念的概念验证GPU集群。该实现基于NAS并行基准中的CG基准,并通过矩阵数据的二维分解来实现其并行化。与小型基准类别的MPI / Infiniband利用率的实现相比,TCA利用率提高了通信性能。本研究还表明,具有二维分解的CG实现更适合于TCA利用而不是CG实现,具有一维分解以利用互连。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号