首页> 外文期刊>ACM transactions on mathematical software >Distributed SBP Cholesky Factorization Algorithms with Near-Optimal Scheduling
【24h】

Distributed SBP Cholesky Factorization Algorithms with Near-Optimal Scheduling

机译:接近最优调度的分布式SBP Cholesky分解算法

获取原文
获取原文并翻译 | 示例

摘要

The minimal block storage Distributed Square Block Packed (DSBP) format for distributed memory computing on symmetric and triangular matrices is presented. Three algorithm variants (Basic, Static, and Dynamic) of the blocked right-looking Cholesky factorization are designed for the DSBP format, implemented, and evaluated. On our target machine, all variants outperform standard full-storage implementations while saving almost half the storage. Communication overhead is shown to be virtually eliminated by the Static and Dynamic variants, both of which take advantage of hardware parallelism to hide communication costs. The Basic variant is shown to yield comparable or slightly better performance than the full-storage ScaLAPACK routine PDPOTRF while clearly outperformed by both Static and Dynamic. Models of execution assuming zero communication costs and overhead are developed. For medium- and larger-sized problems, the Static schedule is near optimal on our target machine based on comparisons with these models and measurements of synchronization overhead.
机译:提出了用于对称和三角矩阵上的分布式内存计算的最小块存储分布式平方块压缩(DSBP)格式。针对DSBP格式设计,实施和评估了阻塞的右眼Cholesky因式分解的三种算法变体(基本,静态和动态)。在我们的目标计算机上,所有变体都优于标准的全存储实现,同时节省了将近一半的存储。事实证明,静态和动态变量实际上消除了通信开销,这两种变量都利用硬件并行性来隐藏通信成本。与基本存储方式的ScaLAPACK例行程序PDPOTRF相比,Basic变体的性能可与之媲美或略胜一筹,而静态和动态均明显优于后者。开发了假设通信成本和开销为零的执行模型。对于中型和大型问题,基于与这些模型的比较和同步开销的测量,静态计划在我们的目标计算机上几乎是最佳的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号