首页> 外文会议>ACM/IEEE conference on Supercomputing >Sparse LU factorization with partial pivoting on distributed memory machines
【24h】

Sparse LU factorization with partial pivoting on distributed memory machines

机译:在分布式存储计算机上进行部分数据透视的稀疏LU分解

获取原文

摘要

Sparse LU factorization with partial pivoting is important to many scientific applications, but the effective parallelization of this algorithm is still an open problem. The main difficulty is that partial pivoting operations make structures of L and U factors unpredictable beforehand. This paper presents a novel approach called S* for parallelizing this problem on distributed memory machines. S* incorporates static symbolic factorization to avoid run-time control overhead and uses nonsymmetric L/U supernode partitioning and amalgamation strategies to maximize the use of BLAS-3 routines. The irregular task parallelism embedded in sparse LU is exploited using graph scheduling and efficient run-time support techniques which optimize communication, overlap computation with communication and balance processor loads. The experimental results on the Cray-T3D with a set of Harwell-Boeing nonsymmetric matrices are very encouraging and good scalability has been achieved. Even compared to a highly optimized sequential code, the parallel speedups are still impressive considering the current status of sparse LU research.

机译:

具有部分枢轴的稀疏LU分解对于许多科学应用很重要,但是该算法的有效并行化仍然是一个未解决的问题。主要困难在于,部分枢转操作会使L和U因子的结构事先无法预测。本文提出了一种称为S *的新颖方法,用于在分布式存储机器上并行化此问题。 S *包含静态符号分解,以避免运行时控制开销,并使用非对称L / U超节点分区和合并策略来最大程度地利用BLAS-3例程。使用图调度和有效的运行时支持技术来开发嵌入在稀疏LU中的不规则任务并行性,该技术可优化通信,与通信重叠的计算并平衡处理器负载。在带有一组Harwell-Boeing非对称矩阵的Cray-T3D上的实验结果令人鼓舞,并且实现了良好的可伸缩性。即使与高度优化的顺序代码相比,考虑到当前稀疏LU研究的现状,并行加速仍然令人印象深刻。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号