首页> 外文会议>2012 Joint 6th International Conference on Soft Computing and Intelligent Systems and 13th International Symposium on Advanced Intelligent Systems >Matrix-vector multiplication and triangular linear solver using GPGPU for symmetric positive definite matrices derived from elliptic equations
【24h】

Matrix-vector multiplication and triangular linear solver using GPGPU for symmetric positive definite matrices derived from elliptic equations

机译:使用GPGPU的矩阵向量乘法和三角形线性求解器,用于求解椭圆方程组中的对称正定矩阵

获取原文
获取原文并翻译 | 示例

摘要

The modern GPUs are well suited for intensive computational tasks and massive parallel computation. Sparse matrix multiplication and linear triangular solver are the most important and heavily used kernels in scientific computation, and several challenges in developing a high performance kernel with the two modules is investigated. The main interest it to solve linear systems derived from the elliptic equations with triangular elements. The resulting linear system has a symmetric positive definite matrix. The sparse matrix is stored in the compressed sparse row (CSR) format. It is proposed a CUDA algorithm to execute the matrix vector multiplication using directly the CSR format. A dependence tree algorithm is used to determine which variables the linear triangular solver can determine in parallel. To increase the number of the parallel threads, a coloring graph algorithm is implemented to reorder the mesh numbering in a pre-processing phase. The proposed method is compared with parallel and serial available libraries. The results show that the proposed method improves the computation cost of the matrix vector multiplication. The pre-processing associated with the triangular solver needs to be executed just once in the proposed method. The conjugate gradient method was implemented and showed similar convergence rate for all the compared methods. The proposed method showed significant smaller execution time.
机译:现代GPU非常适合密集型计算任务和大规模并行计算。稀疏矩阵乘法和线性三角求解器是科学计算中最重要且使用最广泛的内核,并且研究了使用两个模块开发高性能内核的几个挑战。解决从含三角形元素的椭圆方程派生的线性系统的主要兴趣。所得的线性系统具有对称的正定矩阵。稀疏矩阵以压缩的稀疏行(CSR)格式存储。提出了一种CUDA算法来直接使用CSR格式执行矩阵矢量乘法。依赖树算法用于确定线性三角求解器可以并行确定哪些变量。为了增加并行线程的数量,实施了着色图算法以在预处理阶段对网格编号重新排序。将该方法与并行和串行可用库进行了比较。结果表明,该方法提高了矩阵向量乘法的计算成本。在建议的方法中,与三角求解器关联的预处理只需执行一次。共轭梯度法得以实施,并且所有比较方法都显示出相似的收敛速度。所提出的方法显示出明显更少的执行时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号