首页> 外文学位 >Communication-avoiding Krylov subspace methods.
【24h】

Communication-avoiding Krylov subspace methods.

机译:避免通信的Krylov子空间方法。

获取原文
获取原文并翻译 | 示例

摘要

Krylov subspace methods (KSMs) are iterative algorithms for solving large, sparse linear systems and eigenvalue problems. Current KSMs rely on sparse matrix-vector multiply (SpMV) and vector-vector operations (like dot products and vector sums). All of these operations are communication-bound. Furthermore, data dependencies between them mean that only a small amount of that communication can be hidden. Many important scientific and engineering computations spend much of their time in Krylov methods, so the performance of many codes could be improved by introducing KSMs that communicate less.;Our goal is to take s steps of a KSM for the same communication cost as 1 step, which would be optimal. We call the resulting KSMs "communication-avoiding Krylov methods." This thesis makes the following contributions: (1) We have fast kernels replacing SpMV, that can compute the results of s calls to SpMV for the same communication cost as one call (Section 2.1). (2) We have fast dense kernels as well, such as Tall Skinny QR (TSQR -- Section 2.3) and Block Gram-Schmidt (BGS -- Section 2.4), which can do the work of Modified Gram-Schmidt applied to s vectors for a factor of theta(s2) fewer messages in parallel, and a factor of theta(s/W) fewer words transferred between levels of the memory hierarchy (where W is the fast memory capacity in words). (3) We have new communication-avoiding Block Gram-Schmidt algorithms for orthogonalization in more general inner products (Section 2.5). (4) We have new communication-avoiding versions of the following Krylov subspace methods for solving linear systems: the Generalized Minimum Residual method (GMRES -- Section 3.4), both unpreconditioned and preconditioned, and the Method of Conjugate Gradients (CG), both unpreconditioned (Section 5.4) and left-preconditioned (Section 5.5). (5) We have new communication-avoiding versions of the following Krylov subspace methods for solving eigenvalue problems, both standard (Ax = lambdax, for a nonsingular matrix A) and "generalized" (Ax = lambdaMx, for nonsingular matrices A and M): Arnoldi iteration (Section 3.3), and Lanczos iteration, both for Ax = lambdax (Section 4.2) and Ax = lambdaMx (Section 4.3). (6) We propose techniques for developing communication-avoiding versions of nonsymmetric Lanczos iteration (for solving nonsymmetric eigenvalue problems Ax = lambdax) and the Method of Biconjugate Gradients (BiCG) for solving linear systems. (7) We can combine more stable numerical formulations that use different bases of Krylov subspaces with our techniques for avoiding communication. For a discussion of different bases, see Chapter 7. To see an example of how the choice of basis affects the formulation of the Krylov method, see Section 3.2.2. (8) We have faster numerical formulations. For example, in our communication-avoiding version of GMRES, CA-GMRES (see Section 3.4), we can pick the restart length r independently of the s-step basis length s. Experiments in Section 3.5.5 show that this ability improves numerical stability. We show in Section 3.6.3 that it also improves performance in practice, resulting in a 2.23x speedup in the CA-GMRES implementation described below. (9) We combine all of these numerical and performance techniques in a shared-memory parallel implementation of our communication-avoiding version of GMRES, CA-GMRES. Compared to a similarly highly optimized version of standard GMRES, when both are running in parallel on 8 cores of an Intel Clovertown (see Appendix A), CA-GMRES achieves 4.3x speedups over standard GMRES on standard sparse test matrices (described in Appendix B.5). When both are running in parallel on 8 cores of an Intel Nehalem (see Appendix A), CA-GMRES achieves 4.1x speedups. See Section 3.6 for performance results and Section 3.5 for corresponding numerical experiments. We first reported performance results for this implementation on the Intel Clovertown platform in Demmel et al. [78]. (10) We have incorporated preconditioning into our methods. Note that we have not yet developed practical communication-avoiding preconditioners; this is future work. We have accomplished the following: (a) We show (in Sections 2.2 and 4.3) what the s-step basis should compute in the preconditioned case for many different types of Krylov methods and s-step bases. We explain why this is hard in Section 4.3. (b) We have identified two different structures that a preconditioner may have, in order to achieve the desired optimal reduction of communication by a factor of s. See Section 2.2 for details. (Abstract shortened by UMI.)
机译:Krylov子空间方法(KSM)是用于解决大型稀疏线性系统和特征值问题的迭代算法。当前的KSM依赖于稀疏矩阵矢量乘法(SpMV)和矢量矢量运算(例如点积和矢量和)。所有这些操作都是受通信限制的。此外,它们之间的数据依赖性意味着只能隐藏少量的通信。许多重要的科学和工程计算将大量时间都花在Krylov方法上,因此可以通过引入通信量较少的KSM来提高许多代码的性能。;我们的目标是采用KSM的步骤以与1步相同的通信成本,这将是最佳选择。我们将最终的KSM称为“避免通信的Krylov方法”。本文做出了以下贡献:(1)我们有快速的内核代替了SpMV,它可以以与一次调用相同的通信成本来计算s对SpMV的调用结果。 (2)我们也有快速密集的内核,例如Tall Skinny QR(TSQR-第2.3节)和Block Gram-Schmidt(BGS-第2.4节),它们可以完成应用于s向量的修饰Gram-Schmidt的工作。对于因数为theta(s2)的并行消息,较少的消息,以及因数为theta(s / W)的较少的消息在存储器层次结构的各个级别之间传输(其中W是字中的快速存储容量)。 (3)我们提供了新的避免通信的Block Gram-Schmidt算法,用于在更通用的内积中进行正交化(第2.5节)。 (4)我们提供了以下用于求解线性系统的Krylov子空间方法的新通信规避版本:未预处理和预处理的广义最小残差方法(GMRES-第3.4节),以及共轭梯度方法(CG)未预处理(第5.4节)和左预处理(第5.5节)。 (5)对于以下解决本征值问题的Krylov子空间方法,我们提供了新的避免通信的版本:标准(Ax = lambdax,对于非奇异矩阵A)和“广义”(Ax = lambdaMx,对于非奇异矩阵A和M) :Ax = lambdax(4.2节)和Ax = lambdaMx(4.3节)的Arnoldi迭代(第3.3节)和Lanczos迭代。 (6)我们提出了用于开发非对称Lanczos迭代的避免通信版本的技术(用于解决非对称特征值问题Ax = lambdax)和用于求解线性系统的双共轭梯度方法(BiCG)。 (7)我们可以将使用Krylov子空间的不同基数的更稳定的数值公式与我们避免通信的技术结合起来。有关不同基准的讨论,请参见第7章。要查看有关基准选择如何影响Krylov方法公式化的示例,请参见第3.2.2节。 (8)我们有更快的数值公式。例如,在我们的避免通信的GMRES版本CA-GMRES(请参阅第3.4节)中,我们可以独立于s步长基础长度s来选择重启长度r。 3.5.5节中的实验表明,此功能可提高数值稳定性。我们在第3.6.3节中表明,它还在实践中提高了性能,从而使CA-GMRES实施中的速度提高了2.23倍,如下所述。 (9)我们将所有这些数值和性能技术结合在了通用通信的GMRES CA-GMRES版本的共享内存并行实现中。与标准GMRES的类似高度优化版本相比,当两者均在Intel Clovertown的8个内核上并行运行时(请参见附录A),在标准稀疏测试矩阵上,CA-GMRES的速度是标准GMRES的4.3倍(附录B中所述) .5)。当两者在Intel Nehalem的8个内核上并行运行时(请参阅附录A),CA-GMRES的速度提高了4.1倍。性能结果请参见第3.6节,相应的数值实验请参见第3.5节。我们首先在Demmel等人的Intel Clovertown平台上报告了此实施的性能结果。 [78]。 (10)我们已经将预处理纳入了我们的方法中。请注意,我们尚未开发出实用的避免通讯的前置条件;这是未来的工作。我们已经完成了以下工作:(a)我们展示了(在2.2和4.3节中)对于许多不同类型的Krylov方法和s-step基,在预处理情况下s-step基应该计算什么。我们将在第4.3节中解释为什么很难做到这一点。 (b)我们已经确定了预处理器可能具有的两种不同结构,以便以s为因数实现所需的最佳通信减少。有关详细信息,请参见第2.2节。 (摘要由UMI缩短。)

著录项

  • 作者

    Hoemmen, Mark.;

  • 作者单位

    University of California, Berkeley.;

  • 授予单位 University of California, Berkeley.;
  • 学科 Mathematics.;Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 355 p.
  • 总页数 355
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号