A Hybrid CPU-GPU Multifrontal Optimizing Method in Sparse Cholesky Factorization

Chen Yong; Jin Hai; Zheng Ran; Liu Yuandong; Wang Wei

首页> 外文期刊>Journal of signal processing systems for signal, image, and video technology >A Hybrid CPU-GPU Multifrontal Optimizing Method in Sparse Cholesky Factorization

【24h】

A Hybrid CPU-GPU Multifrontal Optimizing Method in Sparse Cholesky Factorization

机译：稀疏Cholesky分解的CPU-GPU混合多面优化方法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In many scientific computing applications, sparse Cholesky factorization is used to solve large sparse linear equations in distributed environment. GPU computing is a new way to solve the problem. However, sparse Cholesky factorization on GPU is hardly to achieve excellent performance due to the structure irregularity of matrix and the low GPU resource utilization. A hybrid CPU-GPU implementation of sparse Cholesky factorization is proposed based on multifrontal method. A large sparse coefficient matrix is decomposed into a series of small dense matrices (frontal matrices) in the method, and then multiple GEMM (General Matrix-matrix Multiplication) operations are computed on them. GEMMs are the main operations in sparse Cholesky factorization, but they are hardly to perform better in parallel on GPU. In order to improve the performance, the scheme of multiple task queues is adopted to perform multiple GEMMs parallelized with multifrontal method; all GEMM tasks are scheduled dynamically on GPU and CPU based on computation scales for load balance and computing-time reduction. Experimental results show that the approach can outperform the implementations of cuBLAS, achieving up to 1.98x speedup on GTX460 (Fermi micro-architecture) and 3.06x speedup on K20m (Kepler micro-architecture), respectively.

机译：在许多科学计算应用中，稀疏Cholesky因式分解用于解决分布式环境中的大型稀疏线性方程。 GPU计算是解决问题的新方法。然而，由于矩阵的结构不规则以及GPU资源利用率低，在GPU上进行稀疏的Cholesky分解很难实现出色的性能。提出了一种基于多前沿方法的稀疏Cholesky分解的CPU-GPU混合实现。该方法将一个大的稀疏系数矩阵分解为一系列小的密集矩阵（额叶矩阵），然后对它们进行多次GEMM（通用矩阵矩阵乘法）运算。 GEMM是稀疏的Cholesky分解的主要操作，但在GPU上并行执行时很难达到更好的效果。为了提高性能，采用多个任务队列的方案来执行多个采用多前沿方法并行化的GEMM。所有GEMM任务都根据计算规模在GPU和CPU上动态调度，以实现负载平衡和减少计算时间。实验结果表明，该方法可以胜过cuBLAS的实现，在GTX460（Fermi微体系结构）上的加速分别达到1.98倍和在K20m（Kepler微体系结构）上的加速达到3.06倍。

著录项

来源
《Journal of signal processing systems for signal, image, and video technology》 |2018年第1期|53-67|共15页
作者
Chen Yong; Jin Hai; Zheng Ran; Liu Yuandong; Wang Wei;
展开▼
作者单位

Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Serv Comp Technol & Syst Lab, Big Data Technol & Syst Lab,Cluster & Grid Comp L, Wuhan 430074, Hubei, Peoples R China;

Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Serv Comp Technol & Syst Lab, Big Data Technol & Syst Lab,Cluster & Grid Comp L, Wuhan 430074, Hubei, Peoples R China;

Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Serv Comp Technol & Syst Lab, Big Data Technol & Syst Lab,Cluster & Grid Comp L, Wuhan 430074, Hubei, Peoples R China;

Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Serv Comp Technol & Syst Lab, Big Data Technol & Syst Lab,Cluster & Grid Comp L, Wuhan 430074, Hubei, Peoples R China;

Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Serv Comp Technol & Syst Lab, Big Data Technol & Syst Lab,Cluster & Grid Comp L, Wuhan 430074, Hubei, Peoples R China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Multifrontal method; Multiple task queues scheme; Task allocation; GPU acceleration;

机译：多面方法;多任务队列方案;任务分配;GPU加速;

相似文献

外文文献
中文文献
专利

1. Optimized sparse Cholesky factorization on hybrid multicore architectures [J] . Tang Meng, Gadou Mohamed, Rennich Steven, Journal of computational science . 2018,第MAY期

机译：混合多核架构上的优化稀疏Cholesky分解
2. Performance models and workload distribution algorithms for optimizing a hybrid CPU-GPU multifrontal solver [J] . Chenhan D. Yu, Weichung Wang Computers & mathematics with applications . 2014,第7期

机译：用于优化混合CPU-GPU多前端求解器的性能模型和工作负载分配算法
3. A CPU-GPU hybrid approach for the unsymmetric multifrontal method [J] . Chenhan D. Yu, Weichung Wang, Danl Pierce Parallel Computing . 2011,第12期

机译：非对称多面方法的CPU-GPU混合方法
4. GPU-based multifrontal optimizing method in sparse Cholesky factorization [C] . Ran Zheng, Wei Wang, Hai Jin, IEEE International Conference on Application-Specific Systems, Architectures and Processors . 2015

机译：稀疏Cholesky分解的基于GPU的多边优化方法
5. Performance Optimization for Sparse Matrix Factorization Algorithms on Hybrid Multicore Architectures [D] . Tang, Meng. 2020

机译：混合多核架构上稀疏矩阵分解算法的性能优化
6. A Novel Signal Separation Method Based on Improved Sparse Non-Negative Matrix Factorization [O] . Huaqing Wang, Mengyang Wang, Junlin Li, 2019

机译：一种基于改进稀疏非负矩阵分解的新型信号分离方法
7. Optimization of a Statically Partitioned Hypermatrix Sparse Cholesky Factorization [O] . José R. Herrero, Juan J. Navarro 2008

机译：静态分区超矩阵稀疏Cholesky分解的优化

A Hybrid CPU-GPU Multifrontal Optimizing Method in Sparse Cholesky Factorization

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅