As a popular iterative method to solve linear equations, restarted generalized minimal residual method (GMRES) has the advantages of fast convergence and good stability.This paper implements a parallel GMRES in GPU based on CUDA.Particularly, the sparse matrix vector multiplication is optimized with coherence visiting and shared memory, which significantly improves the performance.We tested the paralleled GMRES on a GPU of GeForce GTX260, and compared its performance with those of the traditional GMRES on Intel Core 2 Quad CPU Q9400@2.66GHz and Intel Core i7 CPU 920@2.67GHz, which showed 40 times of speed-up and 20 times of speed-up on average respectively.%重开始广义极小残量法(GMRES)是求解大规模线性方程组的常用算法之一,具有收敛速度快、稳定性好等优点.文中基于CUDA将GMRES算法在GPU上进行并行算法实现,尤其针对稀疏矩阵矢量乘法运算,通过合并访问和共享内存策略相结合的手段使得算法效率大幅度提升.对于大规模数据集,在GeForce GTX 260上的运行结果相对于Intel Core 2 Quad CPU Q9400@2.66GHz得到了平均40余倍的加速效果,相对于Intel Core i7 CPU 920@2.67 GHz也可得到平均20余倍的加速效果.
展开▼