...
首页> 外文期刊>CONCURRENCY PRACTICE & EXPERIENCE >Supernodal sparse Cholesky factorization on graphics processing units
【24h】

Supernodal sparse Cholesky factorization on graphics processing units

机译:图形处理单元上的超节点稀疏Cholesky分解

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Sparse Cholesky factorization is the most computationally intensive component in solving large sparsernlinear systems and is the core algorithm of numerous scientific computing applications. A large numberrnof sparse Cholesky factorization algorithms have previously emerged, exploiting architectural features forrnvarious computing platforms. The recent use of graphics processing units (GPUs) to accelerate structuredrnparallel applications shows the potential to achieve significant acceleration relative to desktop performance.rnHowever, sparse Cholesky factorization has not been explored sufficiently because of the complexityrninvolved in its efficient implementation and the concerns of low GPU utilization.rnIn this paper, we present a new approach for sparse Cholesky factorization on GPUs. We present thernorganization of the sparse matrix supernode data structure for GPU and propose a queue-based approachrnfor the generation and scheduling of GPU tasks with dense linear algebraic operations. We also design arnsubtree-based parallel method for multi-GPU system. These approaches increase GPU utilization, thusrnresulting in substantial computational time reduction.rnComparisons are made with the existing parallel solvers by using problems arising from practicalrnapplications. The experiment results show that the proposed approaches can substantially improvernsparse Cholesky factorization performance on GPUs. Relative to a highly optimized parallel algorithm onrna 12-core node, we were able to obtain speedups in the range 1.59× to 2.31× by using one GPU andrn1.80× to 3.21× by using two GPUs. Relative to a state-of-the-art solver based on supernodal method forrnCPU-GPU heterogeneous platform, we were able to obtain speedups in the range 1.52× to 2.30× by usingrnone GPU and 2.15× to 2.76× by using two GPUs. Concurrency and Computation: Practice and Experience,rn2013.
机译:稀疏的Cholesky分解是解决大型稀疏线性系统中计算量最大的组件,并且是众多科学计算应用程序的核心算法。先前已经出现了许多稀疏的Cholesky分解算法,它们利用了各种计算平台的体系结构特征。最近使用图形处理单元(GPU)来加速结构化并行应用程序显示了实现相对于台式机性能的显着加速的潜力。然而,由于有效实现的复杂性和对低GPU的关注,稀疏的Cholesky因式分解尚未得到充分研究。在本文中,我们提出了一种在GPU上进行稀疏Cholesky分解的新方法。我们提出了用于GPU的稀疏矩阵超节点数据结构的组织,并提出了一种基于队列的方法来生成和调度具有密集线性代数运算的GPU任务。我们还为多GPU系统设计了基于arnsubtree的并行方法。这些方法提高了GPU利用率,从而大大减少了计算时间。通过使用实际应用中出现的问题,与现有的并行求解器进行了比较。实验结果表明,所提出的方法可以显着提高GPU上稀疏的Cholesky分解性能。相对于12核节点上的高度优化的并行算法,我们能够通过使用一个GPU获得1.59倍至2.31倍的加速,而通过使用两个GPU能够获得1.80倍至3.21倍的加速。相对于用于CPU-GPU异构平台的基于超节点方法的最新求解器,我们可以使用rnone GPU来获得1.52倍至2.30倍的加速比,而使用两个GPU可以实现2.15倍至2.76倍的加速比。并发与计算:实践与经验,rn2013。

著录项

  • 来源
    《CONCURRENCY PRACTICE & EXPERIENCE》 |2014年第16期|2713-2726|共14页
  • 作者单位

    School of Computer, National University of Defense Technology, Changsha, China;

    School of Computer, National University of Defense Technology, Changsha, China;

    School of Computer, National University of Defense Technology, Changsha, China;

    School of Computer, National University of Defense Technology, Changsha, China;

    School of Computer, National University of Defense Technology, Changsha, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    GPU; sparse Cholesky factorization; supernodal method;

    机译:GPU;稀疏的Cholesky分解超结法;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号