Multi-GPU Parallelization of Nested Factorization for Solving Large Linear Systems

机译：用于求解大线性系统的嵌套分解的多GPU并行化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We describe a massively parallel Nested Factorization (NF) linear solver for large systems of equations. NF is a powerful classic preconditioner receiving renewed attention due to its potential on emerging parallel architectures, especially Graphics Processing Units (GPUs). We build on the Massively Parallel NF (MPNF) framework described by Appleyard et al. (2011). MPNF divides the three- dimensional grid into ‘kernels’, assigns each kernel a color, such that no neighboring kernels share the same color. Parallelism is exploited by operating on all the kernels of a given color simultaneously and cycling through the NF operations color by color. Our MPNF algorithm is designed with special attention to asynchronous CPU-to-GPU memory transfer during the setup phase. Moreover, a CUDA-based BiCGStab Krylov solver and a customized ‘reduction kernel’ with greater bandwidth are used. The key features of the algorithm are: 1) a special ordering of the matrix elements that maximizes coalesced access to GPU global memory and speeds up kernel execution by several folds, 2) application of twisted factorization, which increases the number of concurrent threads at no additional cost, and (3) extension to multiple GPUs by first solving the so-called halo region in each GPU and overlapping peer-to-peer memory transfer between GPUs with solution of the interior regions. The GPU-based NF solver is demonstrated using several large problems, and we breakdown the performance details of all the algorithmic components. For the SPE10 model (highly heterogeneous with over one million cells) on a 512-core Tesla M2090 GPU, our implementation achieves a speed up of 26 for single-precision and 19 for double-precision computations compared with a single core of the Xeon X5660 CPU. Moreover, the (3072-core) 6-GPU solution of a highly refined SPE10 model (26.9 million cells) is more than five times faster than the single-GPU solution.

机译：我们描述了一种大规模并行嵌套分解（NF）的线性解算器方程组大型系统。 NF是一个功能强大的经典的预处理器重新受到关注，因为它在新兴的并行架构，尤其是图形处理单元（GPU）的潜力。我们建立在大规模并行通过阿普尔亚德等人描述NF（MPNF）框架。（2011）。 MPNF将所述三维栅格成“内核”，给每个内核一种颜色，使得没有相邻的内核共享相同的颜色。并行通过在所有给定的颜色的内核同时运行，并通过颜色通过NF操作颜色循环利用。我们MPNF算法的设计要特别注意在安装三相异步CPU到GPU的内存传输。此外，基于CUDA的BICGSTAB克雷洛夫解算器和一个定制的“减少内核”具有更大的带宽被使用。该算法的主要特点是：1）矩阵元素的特殊排序最大化到GPU全局存储器聚结的访问，并加速通过数倍内核执行，2）扭曲因式分解的应用程序，它在不增加并发线程的数目附加成本，以及（3）扩展到多个GPU通过在每个GPU第一解决所谓晕区域和重叠的与所述内部区域的溶液GPU之间对等网络存储器传送。基于GPU的NF求解器使用几个大的问题表现出来，我们分解所有的算法组件的性能细节。有关512芯特斯拉M2090 GPU的SPE10模型（超过一百万的细胞高度异质的），我们的实施实现了加速的26单精度和19，用于双精度计算与至强X5660的单个芯相比中央处理器。此外，高度精制的SPE10模型（26.9百万个细胞）的（3072核心）-6- GPU溶液比单GPU解决方案快五倍以上。

著录项

来源
《Society of Petroleum Engineers Reservoir Simulation Symposium》|2013年||共19页
会议地点
作者
Y. Zhou; H. A. Tchelepi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类油田开发（油藏工程）;
关键词

相似文献

外文文献
中文文献
专利

1. GPU Parallelization Nested Decomposition Method for Solving Large Linear Systems in Reservoir Numerical Simulation [J] . Shi Xin, Di Yuan Earth sciences research journal . 2019,第3期

机译：GPU并行化嵌套分解方法求解储层数值模拟中的大线性系统
2. A parallel nonlinear multigrid solver for unsteady incompressible flow simulation on multi-GPU cluster [J] . Shi Xiaolei, Agrawal Tanmay, Lin Chao-An, Journal of Computational Physics . 2020,第1期

机译：一种平行的非线性多重求解求解器，用于多GPU簇上的非稳态不可压缩仿真
3. THREE-LAYER FACTORIZED DIFFERENCE SCHEMES AND PARALLEL ALGORITHMS FOR SOLVING THE SYSTEM OF LINEAR PARABOLIC EQUATIONS WITH MIXED DERIVATIVES AND VARIABLE COEFFICIENTS [J] . Criado-Aldeanueva F., Davitashvili T., Meladze H., Applied and Computational Mathematics ean international journal . 2016,第1期

机译：求解带混合导数和变系数的线性抛物方程组的三层因子差分格式和并行算法
4. Multi-GPU Parallelization of Nested Factorization for Solving Large Linear Systems [C] . Y. Zhou, H. A. Tchelepi Society of Petroleum Engineers Reservoir Simulation Symposium . 2013

机译：用于求解大线性系统的嵌套分解的多GPU并行化
5. Solving large sparse systems of nonlinear equations and nonlinear least squares problems using tensor methods on sequential and parallel computers*. [D] . Bouaricha, Ali. 1992

机译：在连续和并行计算机上使用张量法来求解非线性方程和非线性最小二乘问题的大型稀疏系统。
6. NMF-mGPU: non-negative matrix factorization on multi-GPU systems [O] . Edgardo Mejía-Roa, Daniel Tabas-Madrid, Javier Setoain, 2015

机译：NMF-mGPU：多GPU系统上的非负矩阵分解
7. Parallel factorizations\ud and parallel solvers for tridiagonal linear systems [O] . AMODIO P, BRUGNANO L 1992

机译：并行因式分解\ ud 和对角线线性系统的并行求解器

Multi-GPU Parallelization of Nested Factorization for Solving Large Linear Systems

摘要

著录项

相似文献

相关主题

期刊订阅