首页> 外文期刊>International Journal of Applied Mathematics and Computer Science >THE PARALLEL TILED WZ FACTORIZATION ALGORITHM FOR MULTICORE ARCHITECTURES
【24h】

THE PARALLEL TILED WZ FACTORIZATION ALGORITHM FOR MULTICORE ARCHITECTURES

机译:多核架构的并行平铺WZ分解算法

获取原文
获取原文并翻译 | 示例

摘要

The aim of this paper is to investigate dense linear algebra algorithms on shared memory multicore architectures. The design and implementation of a parallel tiled WZ factorization algorithm which can fully exploit such architectures are presented. Three parallel implementations of the algorithm are studied. The first one relies only on exploiting multithreaded BLAS (basic linear algebra subprograms) operations. The second implementation, except for BLAS operations, employs the OpenMP standard to use the loop-level parallelism. The third implementation, except for BLAS operations, employs the OpenMP task directive with the depend clause. We report the computational performance and the speedup of the parallel tiled WZ factorization algorithm on shared memory multicore architectures for dense square diagonally dominant matrices. Then we compare our parallel implementations with the respective LU factorization from a vendor implemented LAPACK library. We also analyze the numerical accuracy. Two of our implementations can be achieved with near maximal theoretical speedup implied by Amdahl's law.
机译:本文的目的是研究共享内存多核体系结构上的密集线性代数算法。提出了可以充分利用此类架构的并行平铺WZ分解算法的设计和实现。研究了该算法的三种并行实现。第一个仅依靠利用多线程BLAS(基本线性代数子程序)操作。除BLAS操作外,第二种实现采用OpenMP标准来使用循环级并行性。除BLAS操作外,第三个实现采用OpenMP任务指令和Dependent子句。我们报告了密集正方形对角占优矩阵在共享内存多核体系结构上的并行平铺WZ因式分解算法的计算性能和加速。然后,我们将并行实现与供应商实现的LAPACK库中的各个LU分解进行比较。我们还分析了数值精度。我们的两种实现可以通过阿姆达尔定律暗示的接近最大理论速度来实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号