首页> 外文会议>IEEE International Symposium on Parallel Distributed Processing >Tile QR factorization with parallel panel processing for multicore architectures
【24h】

Tile QR factorization with parallel panel processing for multicore architectures

机译:与多核架构的并行面板处理,QR分解

获取原文

摘要

To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of tasks of fine granularity where nodes represent tasks, either panel factorization or update of a block-column, and edges represent dependencies among them. Although past approaches already achieve high performance on moderate and large square matrices, their way of processing a panel in sequence leads to limited performance when factorizing tall and skinny matrices or small square matrices. We present a new fully asynchronous method for computing a QR factorization on shared-memory multicore architectures that overcomes this bottleneck. Our contribution is to adapt an existing algorithm that performs a panel factorization in parallel (named Communication-A voiding QR and initially designed for distributed-memory machines), to the context of tile algorithms using asynchronous computations. An experimental study shows significant improvement (up to almost 10 times faster) compared to state-of-the-art approaches. We aim to eventually incorporate this work into the Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) library.
机译:为了利用多核架构的潜力,最近的密集线性代数库已经使用了块算法,该算法包括在调度细粒度任务的定向非循环图(DAG),其中节点代表任务,面板分解或块列的更新,并且边缘代表它们之间的依赖关系。虽然过去的方法已经在中等和大型方形矩阵上实现了高性能,但它们以序列的处理方式处理面板的方式导致适应高且瘦矩阵或小方形矩阵时的有限性能。我们提出了一种新的完全异步方法,用于计算核心内存多核架构上的QR分解,以克服此瓶颈。我们的贡献是调整现有算法,该算法并行执行面板分解(命名为通信 - a空缺QR,最初为分布式存储器设计),到使用异步计算的图块算法的上下文。与最先进的方法相比,实验研究表现出显着的改善(速度越大,速度速度速度较快)。我们的目标是最终将这作品纳入了平行的线性代数,用于可扩展的多核架构(等离子体)库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号