...
首页> 外文期刊>Procedia Computer Science >One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators*
【24h】

One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators*

机译:具有多个GPU加速器的多核上的单侧密集矩阵分解[ce:sup loc =“ post”> *

获取原文
           

摘要

One-sided dense matrix factorizations are important computational kernels in many scientific and engineering simulations. In this paper, we propose two extensions of both right-looking (LU and QR) and left-looking (Cholesky) one-sided factorization algorithms to utilize the computing power of current heterogeneous architectures. We first describe a new class of non-GPU-resident algorithms that factorize only a submatrix of a coefficient matrix on a GPU at a time. We then extend the algorithms to use multiple GPUs attached to a multicore. These extensions not only enable the factorization of a matrix that does not fit in the aggregated memory of the multiple GPUs at once, but also provide potential of fully utilizing the computing power of the architectures. Since data movement is expensive on the current architectures, these algorithms are designed to minimize the data movement at multiple levels. To demonstrate the effectiveness of these algorithms, we present their performance on a single compute node of the Keeneland system, which consists of twelve Intel Xeon processors and three NVIDIA GPUs. The performance results show both negligible overheads and scalable performance of our non-GPU-resident and multi-GPU algorithms, respectively. These extensions are now parts of the MAGMA software package, a set of the state-of-the-art dense linear algebra routines for a multicore with GPUs.
机译:单边密集矩阵分解是许多科学和工程仿真中的重要计算内核。在本文中,我们提出了右眼(LU和QR)和左眼(Cholesky)单侧分解算法的两种扩展,以利用当前异构体系结构的计算能力。我们首先描述一类新的非GPU驻留算法,该算法一次仅分解GPU上系数矩阵的子矩阵。然后,我们扩展算法以使用连接到多核的多个GPU。这些扩展不仅使一次分解不适合多个GPU的聚合内存的矩阵分解成为可能,而且还提供了充分利用架构计算能力的潜力。由于在当前架构上数据移动非常昂贵,因此设计这些算法的目的是最大程度地减少多级数据移动。为了证明这些算法的有效性,我们在Keeneland系统的单个计算节点上展示了它们的性能,该系统由十二个Intel Xeon处理器和三个NVIDIA GPU组成。性能结果分别显示了非GPU驻留算法和多GPU算法的开销和可扩展性能。这些扩展现在是MAGMA软件包的一部分,MAGMA软件包是用于带GPU的多核的一组最新的密集线性代数例程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号