A New GPU Algorithm to Compute a Level Set-Based Analysis for the Parallel Solution of Sparse Triangular Systems

机译：稀疏三角形系统并行解决方案基于级别集分析的新GPU算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A myriad of problems in science and engineering, involve the solution of sparse triangular linear systems. They arise frequently as part of direct and iterative solvers for linear systems and eigenvalue problems, and hence can be considered as a key building block of sparse numerical linear algebra. This is why, since the early days, their parallel solution has been exhaustively studied, and efficient implementations of this kernel can be found for almost every hardware platform. In the GPU context, the most widespread implementation of this kernel is the one distributed in NVIDIA CUSPARSE library, which relies on a preprocessing stage to aggregate the unknowns of the triangular system into level sets. This determines an execution schedule for the solution of the system, where the level sets have to be processed sequentially while the unknowns that belong to one level set can be solved in parallel. One of the disadvantages of the CUSPARSE implementation is that this preprocessing stage is often extremely slow in comparison to the runtime of the solving phase. In this work, we present a parallel GPU algorithm that is able to compute the same level sets as CU S PARSE but takes significantly less runtime. Our experiments on a set of matrices from the SuiteSparse collection show acceleration factors of up to 44×. Additionally, we provide a routine capable of solving a triangular linear system on the same pass used to calculate the level sets, yielding important performance benefits.

机译：科学和工程学中的许多问题都涉及稀疏三角线性系统的求解。它们经常作为线性系统和特征值问题的直接和迭代求解器的一部分出现，因此可以被视为稀疏数值线性代数的关键构建块。这就是为什么从早期开始就对它们的并行解决方案进行了详尽的研究，并且几乎可以在每个硬件平台上找到该内核的有效实现的原因。在GPU上下文中，此内核最广泛的实现是在NVIDIA CUSPARSE库中分发的一种，它依赖于预处理阶段将三角形系统的未知数聚合到级别集中。这确定了系统解决方案的执行时间表，其中必须依次处理级别集，而可以并行解决属于一个级别集的未知数。 CUSPARSE实现的缺点之一是，与求解阶段的运行时间相比，该预处理阶段通常非常慢。在这项工作中，我们提出了一种并行GPU算法，该算法能够计算与CU S PARSE相同的级别集，但所需的运行时间大大减少。我们对SuiteSparse集合中的一组矩阵进行的实验显示，加速因子高达44倍。此外，我们提供了一个例程，该例程能够在用于计算级别集的同一遍上求解三角线性系统，从而产生重要的性能优势。

著录项

来源
《IEEE International Parallel and Distributed Processing Symposium》|2018年|920-929|共10页
会议地点
作者
Ernesto Dufrechou; Pablo Ezzatti;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Sparse matrices; Level set; Graphics processing units; Linear systems; Kernel; Runtime; Libraries;

机译：稀疏矩阵;水平集;图形处理单元;线性系统;内核;运行时;库;

相似文献

外文文献
中文文献
专利

1. Using analysis information in the synchronization-free GPU solution of sparse triangular systems [J] . Concurrency, practice and experience . 2020,第10期

机译：在稀疏三角系统的无同步GPU解决方案中使用分析信息
2. GPU-based parallel algorithms for sparse nonlinear systems [J] . V. Galiano, H. Migallon, V. Migallon, Journal of Parallel and Distributed Computing . 2012,第9期

机译：稀疏非线性系统的基于GPU的并行算法
3. Parallel algorithms for solving linear systems with sparse triangular matrices [J] . Jan Mayer Computing. Archives for Informatics and Numerical Computation . 2009,第4期

机译：求解具有稀疏三角矩阵的线性系统的并行算法
4. A new GPU algorithm to compute a level set-based analysis for the parallel solution of sparse triangular systems [C] . Ernesto Dufrechou, Pablo Ezzatti IEEE International Parallel and Distributed Processing Symposium . 2018

机译：一种新的GPU算法，用于计算稀疏三角系统并行解决方案的基于级别集的分析
5. Multilevel preconditioning methods for the parallel iterative solution of large, sparse systems of equations. [D] . Turner, Wesley D. 2001

机译：适用于大型稀疏方程组的并行迭代求解的多级预处理方法。
6. GPU computing with Kaczmarz’s and other iterative algorithms for linear systems [O] . Joseph M. Elble, Nikolaos V. Sahinidis, Panagiotis Vouzis -1

机译：用Kaczmarz和其他迭代算法计算线性系统的GPU计算
7. Parallel Algorithms for Solving Linear Systems with Sparse Triangular Matrices [O] . Jan Mayer 2011

机译：稀疏三角矩阵求解线性系统的并行算法
8. Optimal Parallel Solution of Sparse Triangular Systems [R] . Alvarado, F. L., Schreiber, R. 1990

机译：稀疏三角系统的最优并行解

A New GPU Algorithm to Compute a Level Set-Based Analysis for the Parallel Solution of Sparse Triangular Systems

摘要

著录项

相似文献

相关主题

期刊订阅