首页> 外文期刊>Journal of Parallel and Distributed Computing >Research on the conjugate gradient algorithm with a modified incomplete Cholesky preconditioner on GPU
【24h】

Research on the conjugate gradient algorithm with a modified incomplete Cholesky preconditioner on GPU

机译:改进的不完全Cholesky预处理器在GPU上的共轭梯度算法研究

获取原文
获取原文并翻译 | 示例

摘要

In this study, we discover the parallelism of the forward/backward substitutions (FBS) for two cases and thus propose an efficient preconditioned conjugate gradient algorithm with the modified incomplete Cholesky preconditioner on the GPU (GPUMICPCGA). For our proposed GPUMICPCGA, the following are distinct characteristics: (1) the vector operations are optimized by grouping several vector operations into single kernels, (2) a new kernel of inner product and a new kernel of the sparse matrix-vector multiplication with high optimization are presented, and (3) an efficient parallel implementation of FBS on the GPU (GPUFBS) for two cases are suggested. Numerical results show that our proposed kernels outperform the corresponding ones presented in CUBLAS or CUSPARSE, and GPUFBS is almost 3 times faster than the implementation of FBS using the CUSPARSE library. Furthermore, GPUMICPCGA has better behavior than its counterpart implemented by the CUBLAS and CUSPARSE libraries.
机译:在这项研究中,我们发现了两种情况下前向/后向替换(FBS)的并行性,因此提出了一种在GPU(GPUMICPCGA)上使用改进的不完全Cholesky预处理器的高效预处理共轭梯度算法。对于我们提出的GPUMICPCGA,以下是不同的特征:(1)通过将几个矢量运算分组为单个内核来优化矢量运算;(2)一个新的内积内核和一个新的稀疏矩阵-矢量乘法的内核,具有高提出了优化,并且(3)提出了两种情况下在GPU上高效并行执行FBS(GPUFBS)的建议。数值结果表明,我们提出的内核性能优于CUBLAS或CUSPARSE中提供的相应内核,GPUFBS几乎比使用CUSPARSE库实现FBS快3倍。此外,GPUMICPCGA的行为比CUBLAS和CUSPARSE库实现的行为更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号