首页> 外文期刊>Scientific programming >Computing Low-Rank Approximation of a Dense Matrix on Multicore CPUs with a GPU and Its Application to Solving a Hierarchically Semiseparable Linear System of Equations
【24h】

Computing Low-Rank Approximation of a Dense Matrix on Multicore CPUs with a GPU and Its Application to Solving a Hierarchically Semiseparable Linear System of Equations

机译:带GPU的多核CPU上密集矩阵的低秩逼近计算及其在求解分层半分离线性方程组中的应用

获取原文
获取原文并翻译 | 示例

摘要

Low-rank matrices arise in many scientific and engineering computations. Both computational and storage costs of manipulating such matrices may be reduced by taking advantages of their low-rank properties. To compute a low-rank approximation of a dense matrix, in this paper, we study the performance of QR factorization with column pivoting or with restricted pivoting on multicore CPUs with a GPU. We first propose several techniques to reduce the postprocessing time, which is required for restricted pivoting, on a modern CPU. We then examine the potential of using a GPU to accelerate the factorization process with both column and restricted pivoting. Our performance results on two eight-core Intel Sandy Bridge CPUs with one NVIDIA Kepler GPU demonstrate that using the GPU, the factorization time can be reduced by a factor of more than two. In addition, to study the performance of our implementations in practice, we integrate them into a recently developed software StruMF which algebraically exploits such low-rank structures for solving a general sparse linear system of equations. Our performance results for solving Poisson's equations demonstrate that the proposed techniques can significantly reduce the preconditioner construction time of StruMF on the CPUs, and the construction time can be further reduced by 10%-50% using the GPU.
机译:低阶矩阵出现在许多科学和工程计算中。通过利用它们的低秩特性,可以减少处理此类矩阵的计算和存储成本。为了计算密集矩阵的低秩近似,在本文中,我们研究了在带有GPU的多核CPU上使用列旋转或受限旋转进行QR分解的性能。我们首先提出几种技术来减少现代CPU上受限枢转所需的后处理时间。然后,我们研究了使用GPU通过列和受限枢轴来加速分解过程的潜力。我们在具有一个NVIDIA Kepler GPU的两个八核Intel Sandy Bridge CPU上的性能结果表明,使用GPU可以将分解时间减少两倍以上。另外,为了研究实践中我们的实现的性能,我们将它们集成到最近开发的StruMF软件中,该软件以代数方式利用这种低阶结构来解决一般的稀疏线性方程组。我们解决泊松方程的性能结果表明,所提出的技术可以显着减少StruMF在CPU上的预处理器构建时间,并且使用GPU可以将构建时间进一步减少10%-50%。

著录项

  • 来源
    《Scientific programming》 |2015年第2015期|246019.1-246019.17|共17页
  • 作者单位

    Univ Tennessee, Dept Elect Engn & Comp Sci, Knoxville, TN 37996 USA;

    Univ Tennessee, Dept Elect Engn & Comp Sci, Knoxville, TN 37996 USA;

    Univ Tennessee, Dept Elect Engn & Comp Sci, Knoxville, TN 37996 USA;

  • 收录信息 美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号