Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster

机译：在CPU / GPU混合群集上避免通信的Krylov方法的域分解预处理器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Krylov subspace projection methods are widely used iterative methods for solving large-scale linear systems of equations. Researchers have demonstrated that communication avoiding (CA) techniques can improve Krylov methods' performance on modern computers, where communication is becoming increasingly expensive compared to arithmetic operations. In this paper, we extend these studies by two major contributions. First, we present our implementation of a CA variant of the Generalized Minimum Residual (GMRES) method, called CAGMRES, for solving no symmetric linear systems of equations on a hybrid CPU/GPU cluster. Our performance results on up to 120 GPUs show that CA-GMRES gives a speedup of up to 2.5x in total solution time over standard GMRES on a hybrid cluster with twelve Intel Xeon CPUs and three Nvidia Fermi GPUs on each node. We then outline a domain decomposition framework to introduce a family of preconditioners that are suitable for CA Krylov methods. Our preconditioners do not incur any additional communication and allow the easy reuse of existing algorithms and software for the sub domain solves. Experimental results on the hybrid CPU/GPU cluster demonstrate that CA-GMRES with preconditioning achieve a speedup of up to 7.4x over CAGMRES without preconditioning, and speedup of up to 1.7x over GMRES with preconditioning in total solution time. These results confirm the potential of our framework to develop a practical and effective preconditioned CA Krylov method.

机译：Krylov子空间投影方法是广泛使用的用于求解方程式大规模线性系统的迭代方法。研究人员已经证明，与算术运算相比，通信避免（CA）技术可以提高krylov方法对现代计算机的性能。在本文中，我们将这些研究扩展了两项主要贡献。首先，我们介绍了我们的CA变型的广义最小残差（GMRES）方法，称为CAGMRES，用于在混合CPU / GPU簇上求解方程的无对称线性系统。我们最多120个GPU的绩效结果表明，CA-GMRES在杂交群中的标准GMRES中提供了高达2.5倍的加速，在每个节点上具有十二个英特尔Xeon CPU和三个Nvidia Fermi GPU。然后，我们概述了一个域分解框架，以引入适用于CA Krylov方法的预处理器系列。我们的预处理器不会产生任何额外的通信，并允许易于重用子域解决的现有算法和软件。混合CPU / GPU群体上的实验结果表明，具有预处理的CA-GMRES在CAGMRE上实现高达7.4倍的加速，而无需预处理，并且在总解决方案中的预处理到GMRES的加速高达1.7倍。这些结果证实了我们框架的潜力，以开发实用而有效的预处理CA Krylov方法。

著录项

来源
《International Conference for High Performance Computing, Networking, Storage and Analysis》|2014年|933-944|共12页
会议地点
作者
Yamazaki Ichitaro; Rajamanickam Sivasankaran; Boman Erik G.; Hoemmen Mark; Heroux Michael /A/.; Tomov Stanimire;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
graphics processing units; iterative methods; mathematics computing; CA techniques; CAGMRES; Intel Xeon CPU; Krylov subspace projection methods; Nvidia Fermi GPU; communication-avoiding Krylov method; domain decomposition preconditioners; generalized minimum residual method; hybrid CPU-GPU cluster; iterative methods; large-scale linear systems of equations; Central Processing Unit; Graphics processing units; Jacobian matrices; Kernel; Linear systems; Sparse matrices; Vectors;

机译：图形处理单元;迭代方法;数学计算; CA技术; CAGMRES; Intel Xeon CPU; Krylov子空间投影方法; Nvidia Fermi GPU;避免通信的Krylov方法;域分解预处理器;广义最小残差方法;混合CPU-GPU集群;迭代方法;大规模线性方程组;中央处理单元;图形处理单元; Jacobian矩阵;核;线性系统;稀疏矩阵;向量;

相似文献

外文文献
中文文献
专利

1. A new era in scientific computing: Domain decomposition methods in hybrid CPU-GPU architectures [J] . M. Papadrakakis, G. Stavroulakis, A. Karatarakis Computer Methods in Applied Mechanics and Engineering . 2011,第13a16期

机译：科学计算的新纪元：混合CPU-GPU架构中的域分解方法
2. Overlapping communication and computation of GPU/CPU heterogeneous parallel spatial domain decomposition MOC method [J] . Liang Liang, Zhang Qian, Song Peitao, Annals of nuclear energy . 2020,第Jana期

机译：GPU / CPU异构并行空间域分解MOC方法的重叠通信与计算
3. Implementation of the CPU/GPU hybrid parallel method of characteristics neutron transport calculation using the heterogeneous cluster with dynamic workload assignment [J] . Song Peitao, Zhang Zhijian, Zhang Qian, Annals of nuclear energy . 2020,第Jana期

机译：使用具有动态工作负载分配的异构集群实现特征中子输运计算的CPU / GPU混合并行方法
4. Communication-Avoiding Tile QR Decomposition on CPU/GPU Heterogeneous Cluster System [C] . Masatoshi Takayanagi, Tomohiro Suzuki IEEE International Symposium on Embedded Multicore/Many-core System-on-Chip . 2018

机译：CPU / GPU异构集群系统上的通信避免图块QR分解
5. An MPI-CUDA implementation of a model for calcium induced calcium release in a three-dimensional heart cell on a hybrid CPU/GPU cluster [D] . Huang, Xuan 2015

机译：MPI-CUDA模型在混合CPU / GPU集群上的三维心脏细胞中钙诱导的钙释放的模型实现
6. Efficient methods for implementation of multi-level nonrigid mass-preserving image registration on GPUs and multi-threaded CPUs [O] . Nathan D. Ellingwood, Youbing Yin, Matthew Smith, -1

机译：在GPU和多线程CPU上实现多级非刚性批量保存图像注册的有效方法
7. Randomized algorithms to update partial singular value decomposition on a hybrid CPU/GPU cluster [O] . Ichitaro Yamazaki, Jakub Kurzak, Piotr Luszczek, 2015

机译：随机算法在混合CPU / GPU簇上更新部分奇异值分解的算法

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster

摘要

著录项

相似文献

相关主题

期刊订阅