首页> 外文期刊>Parallel Computing >Batched QR and SVD algorithms on GPUs with applications in hierarchical matrix compression
【24h】

Batched QR and SVD algorithms on GPUs with applications in hierarchical matrix compression

机译:GPU上的批量QR和SVD算法及其在层次矩阵压缩中的应用

获取原文
获取原文并翻译 | 示例

摘要

We present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using randomized methods. We implement multiple kernels based on the level of the GPU memory hierarchy in which the matrices can reside and show substantial speedups against streamed cuSOLVER SVDs. The resulting batched routine is a key component of hierarchical matrix compression, opening up opportunities to perform H-matrix arithmetic efficiently on CPUs. (C) 2017 Elsevier B.V. All rights reserved.
机译:我们介绍了QR的高性能实现以及托管在GPU上的一批小型矩阵的奇异值分解以及在分层矩阵压缩中的应用。单面Jacobi算法由于其简单性和固有的并行性而被用作使用随机方法的低秩块SVD的构建块。我们根据矩阵可以驻留在其中的GPU内存层次结构的级别来实现多个内核,并针对流式cuSOLVER SVD显示出显着的加速。最终的批处理例程是分层矩阵压缩的关键组成部分,这为在CPU上高效执行H矩阵算术提供了机会。 (C)2017 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号