Optimized sparse Cholesky factorization on hybrid multicore architectures

Tang Meng; Gadou Mohamed; Rennich Steven; Davis Timothy A.; Ranka Sanjay

首页> 外文期刊>Journal of computational science >Optimized sparse Cholesky factorization on hybrid multicore architectures

【24h】

Optimized sparse Cholesky factorization on hybrid multicore architectures

机译：混合多核架构上的优化稀疏Cholesky分解

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We present techniques for supernodal sparse Cholesky factorization on a hybrid multicore platform consisting of a multicore CPU and GPU. The techniques are the subtree algorithm, pipelining and multi-threading. The subtree algorithm [15] minimizes PCIe transmissions by storing an entire branch of the elimination tree in the GPU memory (the elimination tree is a tree data structure describing the work-flow of the factorization), and also reduces the total kernel launch time by launching BLAS kernels in batches. The pipelining technique overlaps the execution of GPU kernels and PCIe data transfers. The multithreading technique [17] creates multiple threads for both the CPU and the GPU, to utilize concurrency of the elimination tree. Our experimental results on a platform consisting of an Intel multicore processor along with an Nvidia GPU indicate a significant improvement in performance and energy over CHOLMOD (SuiteSparse 4.5.3), a sparse algorithm, after these techniques are applied. (C) 2018 Elsevier B.V. All rights reserved.

机译：我们介绍了在由多核CPU和GPU组成的混合多核平台上进行超节点稀疏Cholesky分解的技术。这些技术是子树算法，流水线和多线程。子树算法[15]通过将消除树的整个分支存储在GPU内存中来最大程度地减少PCIe传输（消除树是描述因式分解工作流程的树数据结构），并且还减少了总的内核启动时间分批启动BLAS内核。流水线技术与GPU内核和PCIe数据传输的执行重叠。多线程技术[17]为CPU和GPU创建了多个线程，以利用消除树的并发性。我们在由英特尔多核处理器和Nvidia GPU组成的平台上的实验结果表明，在应用了这些技术之后，与稀疏算法CHOLMOD（SuiteSparse 4.5.3）相比，性能和能耗有了显着提高。（C）2018 Elsevier B.V.保留所有权利。

著录项

来源
《Journal of computational science》 |2018年第5期|246-253|共8页
作者
Tang Meng; Gadou Mohamed; Rennich Steven; Davis Timothy A.; Ranka Sanjay;
展开▼
作者单位

Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32611 USA;

Univ Florida, Dept Comp Informat Sci & Engn, Gainesville, FL 32611 USA;

NVIDIA, Santa Clara, CA USA;

Texas A&M Univ, Comp Sci & Engn Dept, College Stn, TX USA;

Univ Florida, Dept Comp Informat Sci & Engn, Gainesville, FL 32611 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Sparse matrices; Sparse direct methods; Cholesky factorization; GPU; CUDA;

机译：稀疏矩阵;稀疏直接方法;Cholesky分解;GPU;CUDA;

相似文献

外文文献
中文文献
专利

1. A Multithreaded Algorithm for Sparse Cholesky Factorization on Hybrid Multicore Architectures [J] . Meng Tang, Mohamed Gadou, Sanjay Ranka Procedia Computer Science . 2017,第1期

机译：混合多核体系结构上稀疏的Cholesky分解的多线程算法
2. A Hybrid CPU-GPU Multifrontal Optimizing Method in Sparse Cholesky Factorization [J] . Chen Yong, Jin Hai, Zheng Ran, Journal of signal processing systems for signal, image, and video technology . 2018,第1期

机译：稀疏Cholesky分解的CPU-GPU混合多面优化方法
3. Design of a multicore sparse Cholesky factorization using DAGs [J] . Hogg J.D., Reid J.K., Scott J.A. SIAM Journal on Scientific Computing . 2011,第6期

机译：使用DAG的多核稀疏Cholesky分解设计
4. A Hybrid Ordering Scheme for Efficient Sparse Cholesky Factorization [C] . LuYao, Zhenghua Wang, Zongzhe Li, International conference on computer and automation engineering . 2011

机译：有效的稀疏Cholesky分解的混合排序方案
5. Performance Optimization for Sparse Matrix Factorization Algorithms on Hybrid Multicore Architectures [D] . Tang, Meng. 2020

机译：混合多核架构上稀疏矩阵分解算法的性能优化
6. Multicore-shell nanofiber architecture of polyimide/polyvinylidene fluoride blend for thermal and long-term stability of lithium ion battery separator [O] . Sejoon Park, Chung Woo Son, Sungho Lee, -1

机译：聚酰亚胺/聚偏二氟乙烯共混物的多核壳纳米纤维结构可提高锂离子电池隔膜的热稳定性和长期稳定性
7. Design of a Multicore Sparse Cholesky Factorization Using DAGs [O] . J. D. Hogg, J. K. Reid, J. A. Scott 2010

机译：使用DAG设计设计多核稀疏凿孔的分解

Optimized sparse Cholesky factorization on hybrid multicore architectures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅