Implementing a Blocked Aasen's Algorithm with a Dynamic Scheduler on Multicore Architectures

机译：在多核体系结构上使用动态调度程序实现Blocked Aasen算法

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Factorization of a dense symmetric indefinite matrix is a key computational kernel in many scientific and engineering simulations. However, there is no scalable factorization algorithm that takes advantage of the symmetry and guarantees numerical stability through pivoting at the same time. This is because such an algorithm exhibits many of the fundamental challenges in parallel programming like irregular data accesses and irregular task dependencies. In this paper, we address these challenges in a tiled implementation of a blocked Aasen's algorithm using a dynamic scheduler. To fully exploit the limited parallelism in this left-looking algorithm, we study several performance enhancing techniques, e.g., parallel reduction to update a panel, tall-skinny LU factorization algorithms to factorize the panel, and a parallel implementation of symmetric pivoting. Our performance results on up to 48 AMD Opteron processors demonstrate that our implementation obtains speedups of up to 2.8 over MKL, while losing only one or two digits in the computed residual norms.

机译：在许多科学和工程仿真中，密集对称不定矩阵的分解是一个关键的计算内核。但是，没有可伸缩的因式分解算法可以利用对称性并通过同时旋转来保证数值稳定性。这是因为这种算法在并行编程中表现出许多基本挑战，例如不规则的数据访问和不规则的任务依赖性。在本文中，我们使用动态调度程序在分块实现的Aasen算法的平铺实现中解决了这些挑战。为了充分利用此左眼算法中有限的并行性，我们研究了几种性能增强技术，例如，并行缩减以更新面板，高瘦LU分解算法以分解面板以及对称旋转的并行实现。我们在多达48个AMD Opteron处理器上的性能结果表明，我们的实现比MKL的加速高达2.8，同时在计算的残差范数中仅损失一位或两位数。

著录项

来源
《IEEE International Parallel Distributed Processing Symposium》|2013年|895-907|共13页
会议地点 Boston MA(US)
作者
Ballard Grey; Becker Dulceneia; Demmel James; Dongarra Jack;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures [J] . Azzam Haidar, Hatem Ltaief, Asim YarKhan, Concurrency and computation: practice and experience . 2012,第3期

机译：多核体系结构上稠密线性代数的动态调度图块算法分析
2. Scheduling Two-Sided Transformations Using Tile Algorithms on Multicore Architectures [J] . HatemLtaief, JakubKurzak, JackDongarra, Scientific programming . 2010,第1期

机译：在多核体系结构上使用图块算法调度双向转换
3. Scheduling two-sided transformations using tile algorithms on multicore architectures [J] . Hatem Ltaief, Jakub Kurzak, Jack Dongarra, Scientific programming . 2010,第1期

机译：在多核体系结构上使用平铺算法调度双面转换
4. Implementing a Blocked Aasen's Algorithm with a Dynamic Scheduler on Multicore Architectures [C] . Grey Ballard, Dulceneia Becker, James Demmel, IEEE International Parallel Distributed Processing Symposium . 2013

机译：在多核架构上实现带有动态调度程序的阻止AASEN算法
5. Implementation of a dynamic programming algorithm for DNA sequence alignment on the Cell Matrix(TM) architecture. [D] . Wang, Bin. 2002

机译：在Cell MatrixTM体系结构上实现用于DNA序列比对的动态编程算法。
6. Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards [O] . Francesc Massanes, Marie Cadennes, Jovan G. Brankov -1

机译：计算的统一设备架构实现块匹配算法的多个图形处理单元卡
7. Implementing a Blocked Aasen’s Algorithm with a Dynamic Scheduler on Multicore Architectures [O] . Grey Ballard, Dulceneia Becker, James Demmel, 2013

机译：在多核架构上使用动态调度程序实现Blocked Aasen算法

Implementing a Blocked Aasen's Algorithm with a Dynamic Scheduler on Multicore Architectures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅