Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting

Jack Dongarra; Mathieu Faverge; Hatem Ltaief; Piotr Luszczek

首页> 外文期刊>Concurrency and computation: practice and experience >Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting

【24h】

Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting

机译：使用局部旋转的递归图块LU分解实现数值精度和高性能

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The LU factorization is an important numerical algorithm for solving systems of linear equations in sciencernand engineering and is a characteristic of many dense linear algebra computations. For example, it hasrnbecome the de facto numerical algorithm implemented within the LINPACK benchmark to rank the mostrnpowerful supercomputers in the world, collected by the TOP500 website. Multicore processors continue tornpresent challenges to the development of fast and robust numerical software due to the increasing levelsrnof hardware parallelism and widening gap between core and memory speeds. In this context, the difficultyrnin developing new algorithms for the scientific community resides in the combination of two goals:rnachieving high performance while maintaining the accuracy of the numerical algorithm. This paper proposesrna new approach for computing the LU factorization in parallel on multicore architectures, which not onlyrnimproves the overall performance but also sustains the numerical quality of the standard LU factorizationrnalgorithm with partial pivoting. While the update of the trailing submatrix is computationally intensive andrnhighly parallel, the inherently problematic portion of the LU factorization is the panel factorization due tornits memory-bound characteristic as well as the atomicity of selecting the appropriate pivots. Our approachrnuses a parallel fine-grained recursive formulation of the panel factorization step and implements the updaternof the trailing submatrix with the tile algorithm. Based on conflict-free partitioning of the data and locklessrnsynchronization mechanisms, our implementation lets the overall computation flow naturally withoutrncontention. The dynamic runtime system called QUARK is then able to schedule tasks with heterogeneousrngranularities and to transparently introduce algorithmic lookahead. The performance results of our implementationrnare competitive compared to the currently available software packages and libraries. For example,rnit is up to 40% faster when compared to the equivalent Intel MKL routine and up to threefold faster thanrnLAPACK with multithreaded Intel MKL BLAS.

机译：LU分解是科学和工程学中求解线性方程组的重要数值算法，并且是许多密集线性代数计算的特征。例如，它已成为LINPACK基准内实施的事实上的数值算法，以对TOP500网站收集的世界上功能最强大的超级计算机进行排名。由于硬件并行性水平的提高以及内核与内存速度之间差距的不断扩大，多核处理器继续对快速，强大的数字软件的开发提出挑战。在这种情况下，为科学界开发新算法的困难在于两个目标的结合：在保持数值算法准确性的同时实现高性能。本文提出了一种在多核架构上并行计算LU分解的新方法，该方法不仅可以提高整体性能，而且还可以通过部分旋转来维持标准LU分解算法的数值质量。尽管尾部子矩阵的更新需要大量计算并且高度并行，但是LU因式分解的固有问题部分是面板因式分解，这归因于其内存绑定特性以及选择适当枢轴的原子性。我们的方法使用面板分解步骤的并行细粒度递归公式，并使用tile算法实现尾随子矩阵的updaternof。基于数据的无冲突分区和无锁同步机制，我们的实现使整个计算流程自然而无争执。然后，称为QUARK的动态运行时系统能够调度具有异构粒度的任务，并透明地引入算法前瞻。与当前可用的软件包和库相比，我们实施的性能结果具有竞争力。例如，与等效的Intel MKL例程相比，rnit的速度提高了40％，比多线程Intel MKL BLAS的rnLAPACK的速度提高了三倍。

著录项

来源
《Concurrency and computation: practice and experience》 |2014年第7期|1408-1431|共24页
作者
Jack Dongarra; Mathieu Faverge; Hatem Ltaief; Piotr Luszczek;
展开▼
作者单位

Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, USA;

Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, USA;

KAUST Supercomputing Laboratory, Thuwal, Saudi Arabia;

Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
recursion; LU factorization; parallel linear algebra; shared memory synchronization; threaded parallelism;

机译：递归LU分解平行线性代数共享内存同步;线程并行;

相似文献

外文文献
中文文献
专利

1. LU Factorization with Partial Pivoting for a Multicore System with Accelerators [J] . Kurzak Jakub, Luszczek Piotr, Faverge Mathieu, IEEE Transactions on Parallel and Distributed Systems . 2013,第8期

机译：具有加速器的多核系统的部分透视LU分解
2. A Supernodal Approach to Incomplete LU Factorization with Partial Pivoting [J] . XIAOYE S. LI, MEIYUE SHAO ACM transactions on mathematical software . 2011,第4期

机译：部分枢轴不完全LU分解的超节点方法
3. ON THE ROW MERGE TREE FOR SPARSE LU FACTORIZATION WITH PARTIAL PIVOTING [J] . L. GRIGORI, M. COSNARD, E. G. NG BIT numerical mathematics . 2007,第1期

机译：行的稀疏LU分解的行合并树
4. Saving Energy in the LU Factorization with Partial Pivoting on Multi-core Processors [C] . Alonso Pedro, Dolz Manuel F., Igual Francisco D., Parallel, Distributed and Network-Based Processing (PDP), 2012 20th Euromicro International Conference on . 2012

机译：通过在多核处理器上进行部分透视来节省LU分解中的能源
5. Evaluating the Performance of a Multi-Tile Macroalgae Cultivation Structure Using Physical and Numerical Modeling [D] . Davonski, Zachary. 2020

机译：评估使用物理学建模的多瓦大型栽培结构的性能
6. Accuracy and Performance of Functional Parameter Estimation Using a Novel Numerical Optimization Approach for GPU-Based Kinetic Compartmental Modeling [O] . Igor Svistoun, Brandon Driscoll, Catherine Coolens 2019

机译：基于GPU的动力学隔室建模的新型数值优化方法估计功能参数的准确性和性能
7. LU Factorization with Partial Pivoting for a Multicore System with Accelerators [O] . Jakub Kurzak, Piotr Luszczek, Mathieu Faverge, 2013

机译：具有加速器的多核系统的局部透视LU分解

Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅