首页> 外文期刊>Mathematical Problems in Engineering: Theory, Methods and Applications >Developing a Multi-GPU-Enabled Preconditioned GMRES with Inexact Triangular Solves for Block Sparse Matrices
【24h】

Developing a Multi-GPU-Enabled Preconditioned GMRES with Inexact Triangular Solves for Block Sparse Matrices

机译:开发具有用于块稀疏矩阵的不精确三角形求解的多GPU的预处理GMRES

获取原文
           

摘要

Solving triangular systems is the building block for preconditioned GMRES algorithm. Inexact preconditioning becomes attractive because of the feature of high parallelism on accelerators. In this paper, we propose and implement an iterative, inexact block triangular solve on multi-GPUs based on PETSc’s framework. In addition, by developing a distributed block sparse matrix-vector multiplication procedure and investigating the optimized vector operations, we form the multi-GPU-enabled preconditioned GMRES with the block Jacobi preconditioner. In the implementation, the GPU-Direct technique is employed to avoid host-device memory copies. The preconditioning step used by PETSc’s structure and the cuSPARSE library are also investigated for performance comparisons. The experiments show that the developed GMRES with inexact preconditioning on 8 GPUs can achieve up to 4.4x speedup over the CPU-only implementation with exact preconditioning using 8 MPI processes.
机译:求解三角系统是预处理GMRES算法的构建块。 由于加速器上的高行性的特征,不精确的预处理变得有吸引力。 在本文中,我们提出并实施了基于PETSC框架的多GPU的迭代,不精确的块三角解决。 另外,通过开发分布式块稀疏矩阵 - 向量乘法过程并调查优化的矢量操作,我们将使用块Jacobi Preconditcher构成了启用的多GPU的预处理标准程序。 在实现中,使用GPU-Direct技术来避免主机设备存储器副本。 还调查了PETSC结构和CUSPARSE库使用的预处理步骤以进行性能比较。 实验表明,在8个GPU上具有不精确的预处理的发达的GMRE可以通过使用8MPI进程的精确预处理来实现高达4.4倍的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号