THE PARALLEL TILED WZ FACTORIZATION ALGORITHM FOR MULTICORE ARCHITECTURES

Beata BYLINA; JarosLaw BYLINA

首页> 外文期刊>International Journal of Applied Mathematics and Computer Science >THE PARALLEL TILED WZ FACTORIZATION ALGORITHM FOR MULTICORE ARCHITECTURES

【24h】

THE PARALLEL TILED WZ FACTORIZATION ALGORITHM FOR MULTICORE ARCHITECTURES

机译：多核架构的并行平铺WZ分解算法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The aim of this paper is to investigate dense linear algebra algorithms on shared memory multicore architectures. The design and implementation of a parallel tiled WZ factorization algorithm which can fully exploit such architectures are presented. Three parallel implementations of the algorithm are studied. The first one relies only on exploiting multithreaded BLAS (basic linear algebra subprograms) operations. The second implementation, except for BLAS operations, employs the OpenMP standard to use the loop-level parallelism. The third implementation, except for BLAS operations, employs the OpenMP task directive with the depend clause. We report the computational performance and the speedup of the parallel tiled WZ factorization algorithm on shared memory multicore architectures for dense square diagonally dominant matrices. Then we compare our parallel implementations with the respective LU factorization from a vendor implemented LAPACK library. We also analyze the numerical accuracy. Two of our implementations can be achieved with near maximal theoretical speedup implied by Amdahl's law.

机译：本文的目的是研究共享内存多核体系结构上的密集线性代数算法。提出了可以充分利用此类架构的并行平铺WZ分解算法的设计和实现。研究了该算法的三种并行实现。第一个仅依靠利用多线程BLAS（基本线性代数子程序）操作。除BLAS操作外，第二种实现采用OpenMP标准来使用循环级并行性。除BLAS操作外，第三个实现采用OpenMP任务指令和Dependent子句。我们报告了密集正方形对角占优矩阵在共享内存多核体系结构上的并行平铺WZ因式分解算法的计算性能和加速。然后，我们将并行实现与供应商实现的LAPACK库中的各个LU分解进行比较。我们还分析了数值精度。我们的两种实现可以通过阿姆达尔定律暗示的接近最大理论速度来实现。

著录项

来源
《International Journal of Applied Mathematics and Computer Science》 |2019年第2期|407-419|共13页
作者
Beata BYLINA; JarosLaw BYLINA;
展开▼
作者单位

Institute of Mathematics Marie Curie-Sktodowska University PI. M. Curie-Sklodowskiej 5 20-031 Lublin Poland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
tiled algorithm; WZ factorization; solution of linear systems; Amdahl's law; high performance computing; multicore architectures;

机译：平铺算法;WZ分解;线性系统的解;阿姆达尔定律;高性能计算;多核架构;

相似文献

外文文献
中文文献
专利

1. The Parallel Tiled WZ Factorization Algorithm for Multicore Architectures [J] . Beata Bylina, Jaros?aw Bylina International journal of applied mathematics and computer science . 2019,第2期

机译：多核架构的并行铺层WZ分解算法
2. Parallel tiled QR factorization for multicore architectures [J] . Alfredo Buttari, Julien Langou, Jakub Kurzak, Concurrency and Computation . 2008,第13期

机译：多核架构的并行平铺QR分解
3. A Class Of Parallel Tiled Linear Algebra Algorithms For Multicore Architectures [J] . Alfredo Buttari, Julien Langou, Jakub Kurzak, Parallel Computing . 2009,第1期

机译：一类用于多核架构的并行平铺线性代数算法
4. Strategies of parallelizing nested loops on the multicore architectures on the example of the WZ factorization for the dense matrices [C] . Bylina Beata, Bylina Jaroslaw Federated Conference on Computer Science and Information Systems . 2015

机译：以稠密矩阵的WZ因式分解为例，在多核架构上并行化嵌套循环的策略
5. Tiled algorithms for matrix computations on multicore architectures. [D] . Bouwmeester, Henricus M. 2012

机译：用于多核架构上矩阵计算的平铺算法。
6. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets [O] . D. D. Shrimankar, S. R. Sathe 2016

机译：大型生物数据集基于新图块的并行编程模型对SMP节点和工作站集群的并行算法进行分析
7. Parallel Tiled QR Factorization for Multicore Architectures [O] . Alfredo Buttari, Julien Langou, Jakub Kurzak, 2010

机译：多核架构的并行平铺QR分解

THE PARALLEL TILED WZ FACTORIZATION ALGORITHM FOR MULTICORE ARCHITECTURES

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅