Solving a system of equations of the form Tx=y, where T is asparse triangular matrix, is required after the factorization phase inthe direct methods of solving systems of linear equations. A fewparallel formulations have been proposed recently. The common belief inparallelizing this problem is that the parallel formulation utilizing atwo dimensional distribution of T is unscalable. We propose the firstknown efficient scalable parallel algorithm which uses a two dimensionalblock cyclic distribution of T. The algorithm is shown to be applicableto dense as well as sparse triangular solvers. Since most of the knownhighly scalable algorithms employed in the factorization phase yield atwo dimensional distribution of T, our algorithm avoids theredistribution cost incurred by the one dimensional algorithms. Wepresent the parallel runtime and scalability analyses of the proposedtwo dimensional algorithm. The dense triangular solver is shown to bescalable. The sparse triangular solver is shown to be at least asscalable as the dense solver. We also show that it is optimal for oneclass of sparse systems. The experimental results of the sparsetriangular solver show that it has good speedup characteristics andyields high performance for a variety of sparse systems
展开▼