首页> 外文期刊>Parallel Computing >Parallel, multigrain iterative solvers for hiding network latencies on MPPs and networks of clusters
【24h】

Parallel, multigrain iterative solvers for hiding network latencies on MPPs and networks of clusters

机译:并行,多粒度迭代求解器,用于隐藏MPP和群集网络上的网络延迟

获取原文
获取原文并翻译 | 示例

摘要

Parallel iterative solvers are often the only means of solving large linear systems and eigen-problems. However, these solvers are usually implemented in a fine-grain manner and can incur significant performance penalties due to synchronization overheads on large MPPs. This problem is exacerbated in clusters of workstations (COWs) and SMPs that are interconnected via a hierarchy of networks. In this paper, we describe a novel scheme for hiding the synchronization overheads, and thus improving scalability, of block iterative solvers that employ a correction equation through an inner iterative method. Block methods are not only robust in the presence of eigenvalue multiplicities and multiple right-hand sides, but provide better latency tolerance by performing more floating-point operations between synchronizations. We take a different approach to inducing latency tolerance by increasing the granularity at which the correction equation is solved for each block vector. This is accomplished by splitting the processors into smaller subgroups which are then used to solve the correction for each block vector concurrently. The rest of the algorithm is still performed in fine grain. We call this combination of fine and coarse-grain parallelism multigrain parallelism. We implemented a multigrain, block Jacobi-Davidson algorithm for computing the extreme eigenvalues of a symmetric matrix. We obtained improvements of 45-50% over both the block and non-block implementations of the fine-grain method when testing on an IBM SP and on a collection of clusters of Sun workstations.
机译:并行迭代求解器通常是求解大型线性系统和本征问题的唯一方法。但是,这些求解器通常以细粒度的方式实现,并且由于大型MPP上的同步开销而可能导致重大的性能损失。通过网络层次结构互连的工作站(COW)和SMP群集会加剧此问题。在本文中,我们描述了一种新颖的方案,用于隐藏通过内部迭代方法采用校正方程的块迭代求解器的同步开销,从而提高可伸缩性。块方法不仅在存在特征值多重性和多个右侧的情况下是鲁棒的,而且通过在同步之间执行更多的浮点运算来提供更好的延迟容限。我们采用不同的方法来通过增加针对每个块矢量求解校正方程的粒度来引入等待时间容限。这是通过将处理器分成较小的子组来完成的,然后将这些子组用于同时解决每个块矢量的校正问题。其余算法仍以细粒度执行。我们将这种细粒度并行性与粗粒度并行性称为多粒度并行性。我们实现了一种多颗粒,块雅可比-戴维森算法,用于计算对称矩阵的极限特征值。在IBM SP和Sun工作站集群上进行测试时,与细粒度方法的块和非块实现相比,我们获得了45-50%的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号