One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators<ce:sup loc='post'>*</ce:sup>

Ichitaro Yamazaki; Stanimire Tomov; Jack Dongarra

首页> 外文期刊>Procedia Computer Science >One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators*

【24h】

One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators*

机译：具有多个GPU加速器的多核上的单侧密集矩阵分解[ce：sup loc =“ post”> *

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

One-sided dense matrix factorizations are important computational kernels in many scientific and engineering simulations. In this paper, we propose two extensions of both right-looking (LU and QR) and left-looking (Cholesky) one-sided factorization algorithms to utilize the computing power of current heterogeneous architectures. We first describe a new class of non-GPU-resident algorithms that factorize only a submatrix of a coefficient matrix on a GPU at a time. We then extend the algorithms to use multiple GPUs attached to a multicore. These extensions not only enable the factorization of a matrix that does not fit in the aggregated memory of the multiple GPUs at once, but also provide potential of fully utilizing the computing power of the architectures. Since data movement is expensive on the current architectures, these algorithms are designed to minimize the data movement at multiple levels. To demonstrate the effectiveness of these algorithms, we present their performance on a single compute node of the Keeneland system, which consists of twelve Intel Xeon processors and three NVIDIA GPUs. The performance results show both negligible overheads and scalable performance of our non-GPU-resident and multi-GPU algorithms, respectively. These extensions are now parts of the MAGMA software package, a set of the state-of-the-art dense linear algebra routines for a multicore with GPUs.

机译：单边密集矩阵分解是许多科学和工程仿真中的重要计算内核。在本文中，我们提出了右眼（LU和QR）和左眼（Cholesky）单侧分解算法的两种扩展，以利用当前异构体系结构的计算能力。我们首先描述一类新的非GPU驻留算法，该算法一次仅分解GPU上系数矩阵的子矩阵。然后，我们扩展算法以使用连接到多核的多个GPU。这些扩展不仅使一次分解不适合多个GPU的聚合内存的矩阵分解成为可能，而且还提供了充分利用架构计算能力的潜力。由于在当前架构上数据移动非常昂贵，因此设计这些算法的目的是最大程度地减少多级数据移动。为了证明这些算法的有效性，我们在Keeneland系统的单个计算节点上展示了它们的性能，该系统由十二个Intel Xeon处理器和三个NVIDIA GPU组成。性能结果分别显示了非GPU驻留算法和多GPU算法的开销和可扩展性能。这些扩展现在是MAGMA软件包的一部分，MAGMA软件包是用于带GPU的多核的一组最新的密集线性代数例程。

著录项

来源
《Procedia Computer Science》 |2012年第1期|共10页
作者
Ichitaro Yamazaki; Stanimire Tomov; Jack Dongarra;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators [J] . Abdelfattah Ahmad, Keyes David, Ltaief Hatem ACM transactions on mathematical software . 2016,第3期

机译：KBLAS：用于GPU加速器上的密集矩阵-矢量乘法的优化库
2. Modeling power and energy consumption of dense matrix factorizations on multicore processors [J] . Pedro Alonso, Manuel F. Dolz, Rafael Mayo, Concurrency, practice and experience . 2014,第17期

机译：在多核处理器上建模密集矩阵分解的功耗和能耗
3. Multi-Sensor Task Assignment Using Linear Matrix Inequalities in the Multiple Target Tracking Problem ? [J] . Victoria Erofeeva, Oleg Granichin, Olga Granichina IFAC PapersOnLine . 2018,第15期

机译：多目标跟踪问题中使用线性矩阵不等式的多传感器任务分配？
4. One-sided dense matrix factorizations on a multicore with multiple GPU accelerators1 [C] . Ichitaro Yamazaki, Stanimire Tomov, Jack Dongarra International Conference on Computational Science . 2013

机译：具有多个GPU Accelerators的多芯上的单面密集矩阵分解
5. Fault Tolerant and Energy Effcient One-Sided Matrix Decompositions on Heterogeneous Systems with GPUs [D] . Chen, Jieyang. 2019

机译：具有GPU的异构系统上的容错和高能效单面矩阵分解
6. NMF-mGPU: non-negative matrix factorization on multi-GPU systems [O] . Edgardo Mejía-Roa, Daniel Tabas-Madrid, Javier Setoain, 2015

机译：NMF-mGPU：多GPU系统上的非负矩阵分解
7. One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators* [O] . Yamazaki Ichitaro, Tomov Stanimire, Dongarra Jack 2012

机译：具有多个GPU加速器的多核上的单边密集矩阵分解*

One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators*

摘要

著录项

相似文献

相关主题

期刊订阅