【24h】

PoLaPACK: Parallel Factorization Algorithms with Algorithmic Blocking

机译:PoLaPACK:具有算法块的并行分解算法

获取原文

摘要

Since parallel computers have different performance ratios of computation and communication, the optimal computational block sizes are different from one an-other to generate the maximum performance of an al-gorithm. Too small or large a block size makes getting good performance on a machine nearly impossible. In such a case, getting a better performance may require a complete redistribution of the data matrix. We present PoLAPACK factorization roulines, in-cluding LU, QR, and Cholesky factorizations, with an "algorithmic blocking" on 2-dimensional block cyclic data distribution. With the algorithmic blocking, it is possible to obtain the near optimal performance irre-spective of the physical block size. The routines are implemented on the SGI/Cray T3E and compared with the corresponding ScaLAPACK pactorization routines.
机译:由于并行计算机具有不同的计算和通信性能比率,因此最佳计算块大小彼此不同,以产生算法的最大性能。块大小太大或太小,几乎不可能在机器上获得良好的性能。在这种情况下,要获得更好的性能,可能需要重新分配数据矩阵。我们介绍PoLAPACK分解路线,包括LU,QR和Cholesky分解,以及二维块循环数据分布上的“算法阻止”。利用算法块,有可能获得几乎最佳的性能,而与物理块大小无关。该例程在SGI / Cray T3E上实现,并与相应的ScaLAPACK排序例程进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号