Supernodal sparse Cholesky factorization on graphics processing units

Dan Zou; Yong Dou; Song Guo; Rongchun Li; Lin Deng

首页> 外文期刊>CONCURRENCY PRACTICE & EXPERIENCE >Supernodal sparse Cholesky factorization on graphics processing units

【24h】

Supernodal sparse Cholesky factorization on graphics processing units

机译：图形处理单元上的超节点稀疏Cholesky分解

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Sparse Cholesky factorization is the most computationally intensive component in solving large sparsernlinear systems and is the core algorithm of numerous scientific computing applications. A large numberrnof sparse Cholesky factorization algorithms have previously emerged, exploiting architectural features forrnvarious computing platforms. The recent use of graphics processing units (GPUs) to accelerate structuredrnparallel applications shows the potential to achieve significant acceleration relative to desktop performance.rnHowever, sparse Cholesky factorization has not been explored sufficiently because of the complexityrninvolved in its efficient implementation and the concerns of low GPU utilization.rnIn this paper, we present a new approach for sparse Cholesky factorization on GPUs. We present thernorganization of the sparse matrix supernode data structure for GPU and propose a queue-based approachrnfor the generation and scheduling of GPU tasks with dense linear algebraic operations. We also design arnsubtree-based parallel method for multi-GPU system. These approaches increase GPU utilization, thusrnresulting in substantial computational time reduction.rnComparisons are made with the existing parallel solvers by using problems arising from practicalrnapplications. The experiment results show that the proposed approaches can substantially improvernsparse Cholesky factorization performance on GPUs. Relative to a highly optimized parallel algorithm onrna 12-core node, we were able to obtain speedups in the range 1.59× to 2.31× by using one GPU andrn1.80× to 3.21× by using two GPUs. Relative to a state-of-the-art solver based on supernodal method forrnCPU-GPU heterogeneous platform, we were able to obtain speedups in the range 1.52× to 2.30× by usingrnone GPU and 2.15× to 2.76× by using two GPUs. Concurrency and Computation: Practice and Experience,rn2013.

机译：稀疏的Cholesky分解是解决大型稀疏线性系统中计算量最大的组件，并且是众多科学计算应用程序的核心算法。先前已经出现了许多稀疏的Cholesky分解算法，它们利用了各种计算平台的体系结构特征。最近使用图形处理单元（GPU）来加速结构化并行应用程序显示了实现相对于台式机性能的显着加速的潜力。然而，由于有效实现的复杂性和对低GPU的关注，稀疏的Cholesky因式分解尚未得到充分研究。在本文中，我们提出了一种在GPU上进行稀疏Cholesky分解的新方法。我们提出了用于GPU的稀疏矩阵超节点数据结构的组织，并提出了一种基于队列的方法来生成和调度具有密集线性代数运算的GPU任务。我们还为多GPU系统设计了基于arnsubtree的并行方法。这些方法提高了GPU利用率，从而大大减少了计算时间。通过使用实际应用中出现的问题，与现有的并行求解器进行了比较。实验结果表明，所提出的方法可以显着提高GPU上稀疏的Cholesky分解性能。相对于12核节点上的高度优化的并行算法，我们能够通过使用一个GPU获得1.59倍至2.31倍的加速，而通过使用两个GPU能够获得1.80倍至3.21倍的加速。相对于用于CPU-GPU异构平台的基于超节点方法的最新求解器，我们可以使用rnone GPU来获得1.52倍至2.30倍的加速比，而使用两个GPU可以实现2.15倍至2.76倍的加速比。并发与计算：实践与经验，rn2013。

著录项

来源
《CONCURRENCY PRACTICE & EXPERIENCE》 |2014年第16期|2713-2726|共14页
作者
Dan Zou; Yong Dou; Song Guo; Rongchun Li; Lin Deng;
展开▼
作者单位

School of Computer, National University of Defense Technology, Changsha, China;

School of Computer, National University of Defense Technology, Changsha, China;

School of Computer, National University of Defense Technology, Changsha, China;

School of Computer, National University of Defense Technology, Changsha, China;

School of Computer, National University of Defense Technology, Changsha, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
GPU; sparse Cholesky factorization; supernodal method;

机译：GPU;稀疏的Cholesky分解超结法;

相似文献

外文文献
中文文献
专利

1. Algorithm 887: CHOLMOD, Supernodal Sparse Cholesky Factorization and Update/Downdate [J] . YANQING CHEN, TIMOTHY A. DAVIS, WILLIAM W. HAGER, ACM transactions on mathematical software . 2009,第3期

机译：算法887：CHOLMOD，超节点稀疏Cholesky分解和更新/降级
2. Parallel Processing of Sparse Cholesky Factorization by Generalized Skyline Method [J] . Yoshio Miyakawa, Akira Matsuda, Takashi Kato 情報処理学会論文誌 . 2001,第4期

机译：广义Skyline方法并行处理稀疏胆固醇分解
3. A supernodal block factorized sparse approximate inverse for non-symmetric linear systems [J] . Ferronato Massimiliano, Pini Giorgio Numerical algorithms . 2018,第1期

机译：非对称线性系统的超新块分子稀疏近似逆变
4. Parallel and Fully Recursive Multifrontal Supernodal Sparse Cholesky [C] . Dror Irony, Gil Shklarski, Sivan Toledo International Conference on Computational Science - ICCS 2002 Pt.2, Apr 21-24, 2002, Amsterdam, the Netherlands . 2002

机译：并行和完全递归的多额超结稀疏Cholesky
5. Communication-efficient parallel sparse Cholesky factorization. [D] . Eswar, Kalluri. 1995

机译：具有通信效率的并行稀疏Cholesky分解。
6. GRiNCH: simultaneous smoothing and detection of topological units of genome organization from sparse chromatin contact count matrices with matrix factorization [O] . Da-Inn Lee, Sushmita Roy 2021

机译：GRINCH：从稀疏染色质触点计数矩阵同时平滑和检测基因组组织的基因组织组织的拓扑单元
7. A supernodal Cholesky factorization algorithm for shared-memory multiprocessors [O] . Esmond G. Ng, Barry W. Peyton 1991

机译：共享内存多处理器的超节点Cholesky分解算法
8. Supernodal Cholesky factorization algorithm for shared-memory multiprocessors. [R] . B. W. Peyton E. G. Ng 1991

机译：共享内存多处理器的超节Cholesky分解算法。

Supernodal sparse Cholesky factorization on graphics processing units

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅