首页> 外文期刊>ACM transactions on mathematical software >Algorithm 980: Sparse QR Factorization on the GPU
【24h】

Algorithm 980: Sparse QR Factorization on the GPU

机译:算法980:GPU上的稀疏QR因式分解

获取原文
获取原文并翻译 | 示例

摘要

Sparse matrix factorization involves a mix of regular and irregular computation, which is a particular challenge when trying to obtain high-performance on the highly parallel general-purpose computing cores available on graphics processing units (GPUs). We present a sparse multifrontal QR factorization method that meets this challenge and is significantly faster than a highly optimized method on a multicore CPU. Our method factorizes many frontal matrices in parallel and keeps all the data transmitted between frontal matrices on the GPU. A novel bucket scheduler algorithm extends the communication-avoiding QR factorization for dense matrices by exploiting more parallelism and by exploiting the staircase form present in the frontal matrices of a sparse multifrontal method.
机译:稀疏矩阵分解涉及常规计算和不规则计算的混合,这在尝试在图形处理单元(GPU)上可用的高度并行的通用计算核心上获得高性能时尤其困难。我们提出了一种稀疏的多面QR分解方法,该方法可以应对这一挑战,并且比多核CPU上高度优化的方法要快得多。我们的方法并行分解许多正面矩阵,并在GPU上保持正面矩阵之间传输的所有数据。一种新颖的存储桶调度程序算法,通过利用更多的并行性并利用稀疏多边方法的额矩阵中存在的阶梯形式,扩展了对稠密矩阵的避免通信QR分解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号