Tile QR factorization with parallel panel processing for multicore architectures

机译：与多核架构的并行面板处理，QR分解

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of tasks of fine granularity where nodes represent tasks, either panel factorization or update of a block-column, and edges represent dependencies among them. Although past approaches already achieve high performance on moderate and large square matrices, their way of processing a panel in sequence leads to limited performance when factorizing tall and skinny matrices or small square matrices. We present a new fully asynchronous method for computing a QR factorization on shared-memory multicore architectures that overcomes this bottleneck. Our contribution is to adapt an existing algorithm that performs a panel factorization in parallel (named Communication-A voiding QR and initially designed for distributed-memory machines), to the context of tile algorithms using asynchronous computations. An experimental study shows significant improvement (up to almost 10 times faster) compared to state-of-the-art approaches. We aim to eventually incorporate this work into the Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) library.

机译：为了利用多核架构的潜力，最近的密集线性代数库已经使用了块算法，该算法包括在调度细粒度任务的定向非循环图（DAG），其中节点代表任务，面板分解或块列的更新，并且边缘代表它们之间的依赖关系。虽然过去的方法已经在中等和大型方形矩阵上实现了高性能，但它们以序列的处理方式处理面板的方式导致适应高且瘦矩阵或小方形矩阵时的有限性能。我们提出了一种新的完全异步方法，用于计算核心内存多核架构上的QR分解，以克服此瓶颈。我们的贡献是调整现有算法，该算法并行执行面板分解（命名为通信 - a空缺QR，最初为分布式存储器设计），到使用异步计算的图块算法的上下文。与最先进的方法相比，实验研究表现出显着的改善（速度越大，速度速度速度较快）。我们的目标是最终将这作品纳入了平行的线性代数，用于可扩展的多核架构（等离子体）库。

著录项

来源
《IEEE International Symposium on Parallel Distributed Processing》|2010年||共10页
会议地点
作者
Hadri B.; Ltaief H.; Agullo E.; Dongarra J.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.138-53;
关键词
Communication Avoiding; Dynamic scheduling; Multicore; QR factorization; Tile Algorithms;

机译：通信避免;动态调度;多核;QR分解;瓷砖算法;

相似文献

外文文献
中文文献
专利

1. Parallel tiled QR factorization for multicore architectures [J] . Alfredo Buttari, Julien Langou, Jakub Kurzak, Concurrency and Computation . 2008,第13期

机译：多核架构的并行平铺QR分解
2. THE PARALLEL TILED WZ FACTORIZATION ALGORITHM FOR MULTICORE ARCHITECTURES [J] . Beata BYLINA, JarosLaw BYLINA International Journal of Applied Mathematics and Computer Science . 2019,第2期

机译：多核架构的并行平铺WZ分解算法
3. The Parallel Tiled WZ Factorization Algorithm for Multicore Architectures [J] . Beata Bylina, Jaros?aw Bylina International journal of applied mathematics and computer science . 2019,第2期

机译：多核架构的并行铺层WZ分解算法
4. Tile QR factorization with parallel panel processing for multicore architectures [C] . Hadri B., Ltaief H., Agullo E., 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：适用于多核架构的并行面板处理的平铺QR分解
5. Maintaining high performance in the QR factorization while scaling both problem size and parallelism. [D] . Samuel, Siju. 2011

机译：在QR分解中保持高性能，同时扩展问题大小和并行度。
6. Exploiting Thread-Level and Instruction-Level Parallelism to Cluster Mass Spectrometry Data using Multicore Architectures [O] . Fahad Saeed, Jason D. Hoffert, Trairak Pisitkun, -1

机译：利用多核体系结构利用线程级和指令级并行性对质谱数据进行聚类
7. Tile QR Factorization with Parallel Panel Processing for Multicore Architectures [O] . Bilel Hadri, Hatem Ltaief, Emmanuel Agullo, 2009

机译：用于多核架构的平行面板处理的平铺QR分解

Tile QR factorization with parallel panel processing for multicore architectures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅