LU Factorization with Partial Pivoting for a Multicore System with Accelerators

Kurzak Jakub; Luszczek Piotr; Faverge Mathieu; Dongarra Jack

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >LU Factorization with Partial Pivoting for a Multicore System with Accelerators

【24h】

LU Factorization with Partial Pivoting for a Multicore System with Accelerators

机译：具有加速器的多核系统的部分透视LU分解

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

LU factorization with partial pivoting is a canonical numerical procedure and the main component of the high performance LINPACK benchmark. This paper presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. The difficulty of implementing the algorithm for such a system lies in the disproportion between the computational power of the CPUs, compared to the GPUs, and in the meager bandwidth of the communication link between their memory systems. An additional challenge comes from the complexity of the memory-bound and synchronization-rich nature of the panel factorization component of the block LU algorithm, imposed by the use of partial pivoting. The challenges are tackled with the use of a data layout geared toward complex memory hierarchies, autotuning of GPU kernels, fine-grain parallelization of memory-bound CPU operations and dynamic scheduling of tasks to different devices. Performance in excess of one TeraFLOPS is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.

机译：带有部分枢轴的LU分解是一种规范的数值过程，也是高性能LINPACK基准测试的主要组成部分。本文介绍了具有标准CPU内核和GPU加速器的混合共享内存系统的算法实现。对于这样的系统，实现算法的困难在于，与GPU相比，CPU的计算能力之间不相称，并且它们的内存系统之间的通信链路的带宽很小。另一个挑战来自块LU算法的面板分解因子组件的内存绑定和同步丰富特性的复杂性，这是由于使用了部分枢轴而造成的。通过使用面向复杂内存层次结构的数据布局，GPU内核自动调整，内存绑定CPU操作的细粒度并行化以及对不同设备的任务动态调度，可以解决这些挑战。使用四个AMD Magny Cours CPU和四个NVIDIA Fermi GPU，可实现超过一个TeraFLOPS的性能。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2013年第8期|1613-1621|共9页
作者
Kurzak Jakub; Luszczek Piotr; Faverge Mathieu; Dongarra Jack;
展开▼
作者单位

University of Tennessee, Knoxville|c|;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
GPU; Gaussian elimination; LU factorization; accelerator; manycore; multicore; partial pivoting;

机译：GPU;高斯消除;LU分解;加速器;多核;多核;部分旋转;

相似文献

外文文献
中文文献
专利

1. Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting [J] . Jack Dongarra, Mathieu Faverge, Hatem Ltaief, Concurrency and computation: practice and experience . 2014,第7期

机译：使用局部旋转的递归图块LU分解实现数值精度和高性能
2. A Supernodal Approach to Incomplete LU Factorization with Partial Pivoting [J] . XIAOYE S. LI, MEIYUE SHAO ACM transactions on mathematical software . 2011,第4期

机译：部分枢轴不完全LU分解的超节点方法
3. ON THE ROW MERGE TREE FOR SPARSE LU FACTORIZATION WITH PARTIAL PIVOTING [J] . L. GRIGORI, M. COSNARD, E. G. NG BIT numerical mathematics . 2007,第1期

机译：行的稀疏LU分解的行合并树
4. Programming the LU Factorization for a Multicore System with Accelerators [C] . Jakub Kurzak, Piotr Luszczek, Mathieu Faverge, International conference on high performance computing for computational science . 2013

机译：使用加速器为多核系统编程LU分解
5. Gaussian elimination with partial pivoting on a distributed memory system [D] . Li, Hong 2000

机译：在分布式存储系统上部分旋转的高斯消除
6. Intraoperative Full-Dose of Partial Breast Irradiation with Electrons Delivered by Standard Linear Accelerators for Early Breast Cancer [O] . Alfredo Carlos S. D. Barros, Samir A. Hanna, Heloísa A. Carvalho, 2014

机译：术中用标准线性加速器传递的电子对乳房局部照射进行全剂量早期乳腺癌治疗。
7. LU Factorization with Partial Pivoting for a Multicore System with Accelerators [O] . Jakub Kurzak, Piotr Luszczek, Mathieu Faverge, 2013

机译：具有加速器的多核系统的局部透视LU分解

LU Factorization with Partial Pivoting for a Multicore System with Accelerators

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅