High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

Ali Cevahir; Akira Nukada; Satoshi Matsuoka

首页> 外文期刊>Computer science >High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

【24h】

High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

机译：使用超图分割的多GPU集群上的高性能共轭梯度求解器

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Motivated by high computation power and low price per performance ratio of GPUs, GPU accelerated clusters are being built for high performance scientific computing. In this work, we propose a scalable implementation of a Conjugate Gradient (CG) solver for unstructured matrices on a GPU-extended cluster, where each cluster node has multiple GPUs. Basic computations of the solver are held on GPUs and communications are managed by the CPU. For sparse matrix-vector multiplication, which is the most time-consuming operation, solver selects the fastest between several high performance kernels running on GPUs. In a GPU-extended cluster, it is more difficult than traditional CPU clusters to obtain scalability, since GPUs are very fast compared to CPUs. Since computation on GPUs is faster, GPU-extended clusters demand faster communication between compute units. To achieve scalability, we adopt hypergraph-partitioning models, which are state-of-the-art models for communication reduction and load balancing for parallel sparse iterative solvers. We implement a hierarchical partitioning model which better optimizes underlying heterogeneous system. In our experiments, we obtain up to 94 Gflops double-precision CG performance using 64 NVIDIA Tesla GPUs on 32 nodes.

机译：受GPU的高计算能力和低每性能比价格的推动，正在构建GPU加速集群以用于高性能科学计算。在这项工作中，我们提出了共轭梯度（CG）求解器的可扩展实现，用于GPU扩展的群集上的非结构化矩阵，其中每个群集节点具有多个GPU。求解器的基本计算保存在GPU上，通信由CPU管理。对于最耗时的运算是稀疏矩阵向量乘法，求解器会在GPU上运行的几个高性能内核之间选择最快的算法。在GPU扩展的集群中，获得可伸缩性比传统的CPU集群要困难得多，因为GPU与CPU相比非常快。由于GPU上的计算速度更快，因此GPU扩展的集群要求计算单元之间的通信速度更快。为了实现可伸缩性，我们采用了超图分区模型，这是用于并行稀疏迭代求解器的通信减少和负载平衡的最新模型。我们实现了分层分区模型，可以更好地优化底层异构系统。在我们的实验中，我们在32个节点上使用64个NVIDIA Tesla GPU获得了高达94 Gflops的双精度CG性能。

著录项

来源
《Computer science》 |2010年第2期|83-91|共9页
作者
Ali Cevahir; Akira Nukada; Satoshi Matsuoka;
展开▼
作者单位

Tokyo Institute of Technology, Ookayama 2-12-1, Meguro-ku, Tokyo 152-8552, Japan;

rnTokyo Institute of Technology, Ookayama 2-12-1, Meguro-ku, Tokyo 152-8552, Japan;

rnTokyo Institute of Technology, Ookayama 2-12-1, Meguro-ku, Tokyo 152-8552, Japan National Institute of Informatics, Hitotsubashi 4-5-6, Chiyoda-ku, Tokyo 101-8430, Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
GPU computing; GPU cluster; conjugate gradients; hypergraph partitioning;

机译：GPU计算;GPU集群;共轭梯度超图分割;
入库时间 2022-08-17 13:50:49

相似文献

外文文献
中文文献
专利

1. Paralleization Strategies for Element-by-Element Proconditioned Conjugate Gradient Solver Using High-Performance Fortran for Unstructured Finite-Element Applications on Linux Clusters [J] . Ganesh Thiagarajan, Vibhas Aravamuthan Journal of Computing in Civil Engineering . 2002,第1期

机译：针对Linux集群上非结构化有限元应用的高性能Fortran逐元素预处理共轭梯度求解器的并行化策略
2. Performance Analysis of New Spectral and Hybrid Conjugate Gradient Methods for Solving Unconstrained Optimization Problems [J] . Maulana Malik, Mustafa Mamat, Siti Sabariah Abas, IAENG Internaitonal journal of computer science . 2021,第1Pta1期

机译：求解无约束优化问题的新型光谱和混合共轭梯度方法的性能分析
3. A new three-term spectral conjugate gradient algorithm with higher numerical performance for solving large scale optimization problems based on Quasi-Newton equation [J] . Jie Guo, Zhong Wan International journal of modeling, simulation and scientific computing . 2021,第5期

机译：一种新的三级光谱共轭梯度算法，具有较高的数值性能，用于解决基于准牛顿方程的大规模优化问题
4. A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform [C] . Ament M., Knittel G., Weiskopf D., Parallel, Distributed and Network-Based Processing (PDP), 2010 . 2010

机译：多GPU平台上泊松问题的并行预处理共轭梯度求解器
5. Conjugate gradient-type product methods for solving nonsymmetric linear systems [D] . Szeto, Theodore L. Doug 1994

机译：求解非对称线性系统的共轭梯度型乘积方法
6. Deflated preconditioned conjugate gradient method for solving single-step BLUP models efficiently [O] . Jérémie Vandenplas, Herwin Eding, Mario P. L. Calus, 2018

机译：紧缩预处理共轭梯度法可有效求解单步BLUP模型
7. A parallel preconditioned conjugate gradient solver for the poisson problem on a multi-gpu platform [O] . M. Ament, G. Knittel, D. Weiskopf, 2012

机译：一种用于多gpu平台泊松问题的并行预处理共轭梯度求解器

High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning

摘要

著录项

相似文献

相关主题

期刊订阅