首页> 外文会议>Calable high performance computing conference >Performance of panel and block approaches to sparse Cholesky factorization on the iPSC/860 and Paragon multicomputers
【24h】

Performance of panel and block approaches to sparse Cholesky factorization on the iPSC/860 and Paragon multicomputers

机译:面板的性能和阻止漏洞凿孔的方法IPSC / 860和Paragon多电脑的分解方法

获取原文

摘要

Sparse Cholesky factorization has historically achieved extremely low performance on distributed memory multiprocessors. Three issues must be addressed to improve this situation: (1) parallel factorization methods must be based on more efficient sequential methods; (2) parallel machines must provide higher interprocessor communication bandwidth; and (3) the sparse matrices used to evaluate parallel sparse factorization performance should be more representative of the sizes of matrices people would factor on large parallel machines. All of these issues have in fact already been addressed. Specifically: (1) single-node performance can be improved by moving from a column-oriented approach, where the computational kernel is Level 1 BLAS, to either a panel- or block-oriented approach, where the kernel is Level 3 BLAS; (2) communication hardware has improved dramatically, with new parallel computers providing higher communication bandwidth than previous parallel computers; and (3) several larger benchmark matrices are now available, and newer parallel machines offer sufficient memory per node to factor these larger matrices. The result of addressing these three issues is extremely high performance on moderately parallel machines. This paper demonstrates performance levels of 650 double-precision MFLOPS on 32 processors of the Intel Paragon system, 1 GFLOPS on 64 processors, and 1.7 GFLOPS on 128 processors. This paper also does a direct performance comparison between the iPSC/860 and Paragon systems, as well as a comparison between panel- and block-oriented approaches to parallel factorization.
机译:稀疏的Cholesky分解在历史上实现了分布式内存多处理器的极低性能。必须解决三个问题以改善这种情况:(1)并行分解方法必须基于更高效的顺序方法; (2)并联机器必须提供更高的地区通信带宽; (3)用于评估平行稀疏因子分解性能的稀疏矩阵应更代表更代表人们对大型平行机器的尺寸。所有这些问题都已得到解决。具体地:(1)通过从面向列的方法移动,可以提高单节点性能,其中计算内核是1级BLA,以面向面板或面向块的方法,其中内核为3级BLA; (2)通信硬件急剧提高,具有新的并行计算机,提供比前一个并行计算机更高的通信带宽; (3)现在有几种较大的基准矩阵,每个节点提供足够的内存以提供这些更大的矩阵。解决这三个问题的结果在适度平行的机器上具有极高的性能。本文演示了Intel Paragon系统的32个处理器上的650个双精度MFLOPS的性能水平,64个处理器上的1 GFLOPS,128个处理器上的1.7 GFLOPS。本文还可以在IPSC / 860和Paragon系统之间进行直接的性能比较,以及面向平行分解的面向板和面向块的方法之间的比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号