Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA

机译：与专有互连TCA的GPU簇中NAS并行CG基准的实施与评估

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We have been developing a proprietary interconnect technology called Tightly Coupled Accelerators (TCA) architecture to improve communication latency and bandwidth between accelerators (GPUs) over different nodes. This paper presents a Conjugate Gradient (CG) benchmark implementation using the TCA and results of performance evaluation on the HA-PACS/TCA system, which is a proof-of-concept GPU cluster based on the TCA concept. The implementation is based on the CG benchmark in NAS Parallel Benchmarks, and its parallelization is achieved by a two-dimensional decomposition of matrix data. The TCA utilization improves the communication performance compared with the implementation with MPI/InfiniBand utilization for small size benchmark classes. This study also shows that the CG implementation with the two-dimensional decomposition is more suitable for the TCA utilization than a CG implementation with a one-dimensional decomposition to make use of the interconnect.

机译：我们一直在开发一个名为紧密耦合的加速器（TCA）架构的专有互连技术，以改善在不同节点上的加速器（GPU）之间的通信延迟和带宽。本文介绍了使用TCA的共轭梯度（CG）基准实现，并在HA-PACS / TCA系统上进行性能评估结果，这是基于TCA概念的概念验证GPU集群。该实现基于NAS并行基准中的CG基准，并通过矩阵数据的二维分解来实现其并行化。与小型基准类别的MPI / Infiniband利用率的实现相比，TCA利用率提高了通信性能。本研究还表明，具有二维分解的CG实现更适合于TCA利用而不是CG实现，具有一维分解以利用互连。

著录项

来源
《International Conference on High Performance Computing for Computational Science》|2017年|272p|共11页
会议地点
作者
Kazuya Matsumoto; Norihisa Fujita; Toshihiro Hanawa; Taisuke Boku;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301-53;
关键词

相似文献

外文文献
中文文献
专利

1. Multi-GPU Parallelization of the NAS Multi-Zone Parallel Benchmarks [J] . Gonzalez Marc, Morancho Enric IEEE Transactions on Parallel and Distributed Systems . 2021,第1期

机译：NAS多区并行基准的多GPU并行化
2. Performance Characteristics of Hybrid MPI/OpenMP Implementations of NAS Parallel Benchmarks SP and BT on Large-Scale Multicore Clusters [J] . XlNGFU WU, VALERIE TAYLOR The Computer journal . 2012,第2期

机译：大型多核集群上NAS并行基准SP和BT的混合MPI / OpenMP实现的性能特征
3. Performance Characteristics of Hybrid MPI/OpenMP Implementations of NAS Parallel Benchmarks SP and BT on Large-Scale Multicore Clusters [J] . Xingfu Wu, Valerie Taylor Computer Journal, The . 2012,第2期

机译：大型多核集群上NAS并行基准SP和BT的混合MPI / OpenMP实现的性能特征
4. Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA [C] . Kazuya Matsumoto, Norihisa Fujita, Toshihiro Hanawa, International conference on high performance computing for computational science . 2017

机译：具有专有互连TCA的GPU集群上NAS并行CG基准的实现和评估
5. Parallel implementation and benchmarking in cluster architectures of one-dimensional discrete fourier transforms: A comparison using the row-column algorithm versus a novel formulation based on the bluestein/pseudocirculant algorithm. [D] . Velez Rodriguez, William. 2014

机译：一维离散傅里叶变换的群集体系结构中的并行实现和基准测试：使用行列算法与基于bluestein / pseudocirculant算法的新颖公式进行比较。
6. Molecular Dynamics Simulations Using the Drude Polarizable Force Field on GPUs with OpenMM: Implementation Validation and Benchmarks [O] . Jing Huang, Justin A. Lemkul, Peter K. Eastman, -1

机译：在带有OpenMM的GPU上使用Drude可极化力场的分子动力学模拟：实现验证和基准
7. 1 Performance Characteristics of Hybrid MPI/OpenMP Implementations of NAS Parallel Benchmarks SP and BT on Large-scale Multicore Clusters [O] . Xingfu Wu, Valerie Taylor 2012

机译：1大型多核群集上NAS并行基准SP和BT的MPI / OpenMP混合实现的性能特征
8. Implementation of the NAS Parallel Benchmarks in Java [R] . Frumkin, Michael A., Schultz, Matthew, Jin, Haoqiang, 2002

机译：用Java实现Nas并行基准测试

Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA

摘要

著录项

相似文献

相关主题

期刊订阅