首页> 外文会议>International Conference for High Performance Computing, Networking, Storage and Analysis >Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster
【24h】

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster

机译:在CPU / GPU混合群集上避免通信的Krylov方法的域分解预处理器

获取原文

摘要

Krylov subspace projection methods are widely used iterative methods for solving large-scale linear systems of equations. Researchers have demonstrated that communication avoiding (CA) techniques can improve Krylov methods' performance on modern computers, where communication is becoming increasingly expensive compared to arithmetic operations. In this paper, we extend these studies by two major contributions. First, we present our implementation of a CA variant of the Generalized Minimum Residual (GMRES) method, called CAGMRES, for solving no symmetric linear systems of equations on a hybrid CPU/GPU cluster. Our performance results on up to 120 GPUs show that CA-GMRES gives a speedup of up to 2.5x in total solution time over standard GMRES on a hybrid cluster with twelve Intel Xeon CPUs and three Nvidia Fermi GPUs on each node. We then outline a domain decomposition framework to introduce a family of preconditioners that are suitable for CA Krylov methods. Our preconditioners do not incur any additional communication and allow the easy reuse of existing algorithms and software for the sub domain solves. Experimental results on the hybrid CPU/GPU cluster demonstrate that CA-GMRES with preconditioning achieve a speedup of up to 7.4x over CAGMRES without preconditioning, and speedup of up to 1.7x over GMRES with preconditioning in total solution time. These results confirm the potential of our framework to develop a practical and effective preconditioned CA Krylov method.
机译:Krylov子空间投影方法是广泛使用的用于求解方程式大规模线性系统的迭代方法。研究人员已经证明,与算术运算相比,通信避免(CA)技术可以提高krylov方法对现代计算机的性能。在本文中,我们将这些研究扩展了两项主要贡献。首先,我们介绍了我们的CA变型的广义最小残差(GMRES)方法,称为CAGMRES,用于在混合CPU / GPU簇上求解方程的无对称线性系统。我们最多120个GPU的绩效结果表明,CA-GMRES在杂交群中的标准GMRES中提供了高达2.5倍的加速,在每个节点上具有十二个英特尔Xeon CPU和三个Nvidia Fermi GPU。然后,我们概述了一个域分解框架,以引入适用于CA Krylov方法的预处理器系列。我们的预处理器不会产生任何额外的通信,并允许易于重用子域解决的现有算法和软件。混合CPU / GPU群体上的实验结果表明,具有预处理的CA-GMRES在CAGMRE上实现高达7.4倍的加速,而无需预处理,并且在总解决方案中的预处理到GMRES的加速高达1.7倍。这些结果证实了我们框架的潜力,以开发实用而有效的预处理CA Krylov方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号