首页> 外文会议>IEEE International Parallel Distributed Processing Symposium >Implementing a Blocked Aasen's Algorithm with a Dynamic Scheduler on Multicore Architectures
【24h】

Implementing a Blocked Aasen's Algorithm with a Dynamic Scheduler on Multicore Architectures

机译:在多核体系结构上使用动态调度程序实现Blocked Aasen算法

获取原文

摘要

Factorization of a dense symmetric indefinite matrix is a key computational kernel in many scientific and engineering simulations. However, there is no scalable factorization algorithm that takes advantage of the symmetry and guarantees numerical stability through pivoting at the same time. This is because such an algorithm exhibits many of the fundamental challenges in parallel programming like irregular data accesses and irregular task dependencies. In this paper, we address these challenges in a tiled implementation of a blocked Aasen's algorithm using a dynamic scheduler. To fully exploit the limited parallelism in this left-looking algorithm, we study several performance enhancing techniques, e.g., parallel reduction to update a panel, tall-skinny LU factorization algorithms to factorize the panel, and a parallel implementation of symmetric pivoting. Our performance results on up to 48 AMD Opteron processors demonstrate that our implementation obtains speedups of up to 2.8 over MKL, while losing only one or two digits in the computed residual norms.
机译:在许多科学和工程仿真中,密集对称不定矩阵的分解是一个关键的计算内核。但是,没有可伸缩的因式分解算法可以利用对称性并通过同时旋转来保证数值稳定性。这是因为这种算法在并行编程中表现出许多基本挑战,例如不规则的数据访问和不规则的任务依赖性。在本文中,我们使用动态调度程序在分块实现的Aasen算法的平铺实现中解决了这些挑战。为了充分利用此左眼算法中有限的并行性,我们研究了几种性能增强技术,例如,并行缩减以更新面板,高瘦LU分解算法以分解面板以及对称旋转的并行实现。我们在多达48个AMD Opteron处理器上的性能结果表明,我们的实现比MKL的加速高达2.8,同时在计算的残差范数中仅损失一位或两位数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号