首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium >A New GPU Algorithm to Compute a Level Set-Based Analysis for the Parallel Solution of Sparse Triangular Systems
【24h】

A New GPU Algorithm to Compute a Level Set-Based Analysis for the Parallel Solution of Sparse Triangular Systems

机译:稀疏三角形系统并行解决方案基于级别集分析的新GPU算法

获取原文

摘要

A myriad of problems in science and engineering, involve the solution of sparse triangular linear systems. They arise frequently as part of direct and iterative solvers for linear systems and eigenvalue problems, and hence can be considered as a key building block of sparse numerical linear algebra. This is why, since the early days, their parallel solution has been exhaustively studied, and efficient implementations of this kernel can be found for almost every hardware platform. In the GPU context, the most widespread implementation of this kernel is the one distributed in NVIDIA CUSPARSE library, which relies on a preprocessing stage to aggregate the unknowns of the triangular system into level sets. This determines an execution schedule for the solution of the system, where the level sets have to be processed sequentially while the unknowns that belong to one level set can be solved in parallel. One of the disadvantages of the CUSPARSE implementation is that this preprocessing stage is often extremely slow in comparison to the runtime of the solving phase. In this work, we present a parallel GPU algorithm that is able to compute the same level sets as CU S PARSE but takes significantly less runtime. Our experiments on a set of matrices from the SuiteSparse collection show acceleration factors of up to 44×. Additionally, we provide a routine capable of solving a triangular linear system on the same pass used to calculate the level sets, yielding important performance benefits.
机译:科学和工程学中的许多问题都涉及稀疏三角线性系统的求解。它们经常作为线性系统和特征值问题的直接和迭代求解器的一部分出现,因此可以被视为稀疏数值线性代数的关键构建块。这就是为什么从早期开始就对它们的并行解决方案进行了详尽的研究,并且几乎可以在每个硬件平台上找到该内核的有效实现的原因。在GPU上下文中,此内核最广泛的实现是在NVIDIA CUSPARSE库中分发的一种,它依赖于预处理阶段将三角形系统的未知数聚合到级别集中。这确定了系统解决方案的执行时间表,其中必须依次处理级别集,而可以并行解决属于一个级别集的未知数。 CUSPARSE实现的缺点之一是,与求解阶段的运行时间相比,该预处理阶段通常非常慢。在这项工作中,我们提出了一种并行GPU算法,该算法能够计算与CU S PARSE相同的级别集,但所需的运行时间大大减少。我们对SuiteSparse集合中的一组矩阵进行的实验显示,加速因子高达44倍。此外,我们提供了一个例程,该例程能够在用于计算级别集的同一遍上求解三角线性系统,从而产生重要的性能优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号