Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors

Anzt Hartwig; Dongarra Jack; Flegar Goran; Quintana-Orti Enrique S.

首页> 外文期刊>Parallel Computing >Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors

【24h】

Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors

机译：可变大小的批量高斯-乔丹消除算法，用于图形处理器上的块-雅各比预处理

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this work, we address the efficient realization of block-Jacobi preconditioning on graphics processing units (GPUs). This task requires the solution of a collection of small and independent linear systems. To fully realize this implementation, we develop a variablesize batched matrix inversion kernel that uses Gauss-Jordan elimination (GJE) along with a variable-size batched matrix-vector multiplication kernel that transforms the linear systems' right-hand sides into the solution vectors. Our kernels make heavy use of the increased register count and the warp-local communication associated with newer GPU architectures. Moreover, in the matrix inversion, we employ an implicit pivoting strategy that migrates the workload (i.e., operations) to the place where the data resides instead of moving the data to the executing cores. We complement the matrix inversion with extraction and insertion strategies that allow the block-Jacobi preconditioner to be set up rapidly. The experiments on NVlDlA's K40 and P100 architectures reveal that our variable-size batched matrix inversion routine outperforms the CUDA basic linear algebra subroutine (cuBLAS) library functions that provide the same (or even less) functionality. We also show that the preconditioner setup and preconditioner application cost can be somewhat offset by the faster convergence of the iterative solver. (C) 2018 Elsevier B.V. All rights reserved.

机译：在这项工作中，我们解决了在图形处理单元（GPU）上高效实现Block-Jacobi预处理的问题。该任务需要解决一组小型且独立的线性系统的问题。为了完全实现此实现，我们开发了使用高斯-乔丹消除（GJE）的可变大小批处理矩阵求逆内核，以及将线性系统右侧转换为解矢量的可变大小的批处理矩阵矢量乘法内核。我们的内核大量使用了增加的寄存器数量以及与更新的GPU架构相关的局部扭曲通信。此外，在矩阵求逆中，我们采用了隐式数据透视策略，该策略将工作负载（即操作）迁移到数据所在的位置，而不是将数据移动到执行核心。我们通过提取和插入策略对矩阵求逆进行补充，从而可以快速设置块雅各比预处理器。在NVlDlA的K40和P100体系结构上进行的实验表明，我们的可变大小批量矩阵求逆例程优于提供相同（甚至更少）功能的CUDA基本线性代数子例程（cuBLAS）库功能。我们还表明，迭代器的更快收敛可以稍微抵消预处理器的设置和预处理器的应用程序成本。（C）2018 Elsevier B.V.保留所有权利。

著录项

来源
《Parallel Computing》 |2019年第1期|131-146|共16页
作者
Anzt Hartwig; Dongarra Jack; Flegar Goran; Quintana-Orti Enrique S.;
展开▼
作者单位

Karlsruhe Inst Technol, Karlsruhe, Germany;

Oak Ridge Natl Lab, Oak Ridge, TN USA;

Univ Tennessee, ICL, Knoxville, TN USA;

Univ Jaume 1, Dept Ingn & Ciencia Comp, Castellon de La Plana, Spain;

Univ Manchester, Sch Comp Sci, Manchester, Lancs, England;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Batched algorithms; Matrix inversion; Gauss-Jordan elimination; Block-Jacobi; Sparse linear systems; Graphics processor;

机译：批处理算法;矩阵求逆;高斯-乔丹消元;Block-Jacobi;稀疏线性系统;图形处理器;

相似文献

外文文献
中文文献
专利

1. Variable-Size Batched Gauss-Huard for Block-Jacobi Preconditioning [J] . Hartwig Anzt, Jack Dongarra, Goran Flegar, Procedia Computer Science . 2017,第1期

机译：适用于Block-Jacobi预处理的可变大小批量高斯-哈德
2. Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software [J] . GORAN FLEGAR, HARTWIG ANZT, TERRY COJEAN, ACM transactions on mathematical software . 2021,第2期

机译：Ginkgo线性代数软件中高性能预处理的自适应精密块 - Jacobi
3. Replicated Computational Results (RCR) Report for 'Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software' [J] . SARAH OSBORN ACM transactions on mathematical software . 2021,第2期

机译：用于“自适应精密块-JACOBI”的“自适应精密块-JACOBI”的复制计算结果（RCR）报告，用于Ginkgo Linear代数软件中的高性能预处理'
4. Variable-Size Batched LU for Small Matrices and its Integration into Block-Jacobi Preconditioning [C] . Hartwig Anzt, Jack Dongarra, Goran Flegar, International Conference on Parallel Processing . 2017

机译：可变大小的小矩阵批量LU及其集成到块jacobi预处理中
5. Implementing a Preconditioned Iterative Linear Solver Using Massively Parallel Graphics Processing Units. [D] . Asgari Kamiabad, Amirhassan. 2011

机译：使用大规模并行图形处理单元实现预处理的迭代线性求解器。
6. Real time implementation of anti-scatter grid artifact elimination method for high resolution x-ray imaging CMOS detectors using Graphics Processing Units (GPUs) [O] . R. Rana, S.V. Setlur Nagesh, D.R. Bednarek, -1

机译：使用图形处理单元（GPU）的高分辨率X射线成像CMOS检测器的防散射网格伪影消除方法的实时实现
7. Variable-Size Batched Gauss-Huard for Block-Jacobi Preconditioning [O] . Anzt, Hartwig, Dongarra, Jack, Flegar, Goran, 2017

机译：用于块Jacobi预处理的可变尺寸批量高斯 - Huard
8. Equivalence of Gaussian Elimination and Gauss-Jordan Reduction in Solving Linear Equations [R] . Tsao, N. 1989

机译：求解线性方程组的高斯消元和高斯 - 约旦约简的等价性

Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅