首页> 外文学位 >Avoiding Communication in Dense Linear Algebra.
【24h】

Avoiding Communication in Dense Linear Algebra.

机译:在密集线性代数中避免通信。

获取原文
获取原文并翻译 | 示例

摘要

Dense linear algebra computations are essential to nearly every problem in scientific computing and to countless other fields. Most matrix computations enjoy a high computational intensity (i.e., ratio of computation to data), and therefore the algorithms for the computations have a potential for high efficiency. However, performance for many linear algebra algorithms is limited by the cost of moving data between processors on a parallel computer or throughout the memory hierarchy of a single processor, which we will refer to generally as communication. Technological trends indicate that algorithmic performance will become even more limited by communication in the future. In this thesis, we consider the fundamental computations within dense linear algebra and address the following question: can we significantly improve the current algorithms for these computations, in terms of the communication they require and their performance in practice?;To answer the question, we analyze algorithms on sequential and parallel architectural models that are simple enough to determine coarse communication costs but accurate enough to predict performance of implementations on real hardware. For most of the computations, we prove lower bounds on the communication that any algorithm must perform. If an algorithm exists with communication costs that match the lower bounds (at least in an asymptotic sense), we call the algorithm communication optimal. In many cases, the most commonly used algorithms are not communication optimal, and we can develop new algorithms that require less data movement and attain the communication lower bounds.;In this thesis, we develop both new communication lower bounds and new algorithms, tightening (and in many cases closing) the gap between best known lower bound and best known algorithm (or upper bound). We consider both sequential and parallel algorithms, and we asses both classical and fast algorithms (e.g., Strassen's matrix multiplication algorithm).;In particular, the central contributions of this thesis are: proving new communication lower bounds for nearly all classical direct linear algebra computations (dense or sparse), including factorizations for solving linear systems, least squares problems, and eigenvalue and singular value problems; proving new communication lower bounds for Strassen's and other fast matrix multiplication algorithms; proving new parallel communication lower bounds for classical and fast computations that set limits on an algorithm's ability to perfectly strong scale; summarizing the state-of-the-art in communication efficiency for both sequential and parallel algorithms for the computations to which the lower bounds apply; developing a new communication-optimal algorithm for computing a symmetric-indefinite factorization (observing speedups of up to 2.8x compared to alternative shared-memory parallel algorithms); developing new, more communication-efficient algorithms for reducing a symmetric band matrix to tridiagonal form via orthogonal similar transformations (observing speedups of 2--6x compared to alternative sequential and parallel algorithms); and developing a new communication-optimal parallelization of Strassen's matrix multiplication algorithm (observing speedups of up to 2.84x compared to alternative distributed-memory parallel algorithms).
机译:密集的线性代数计算对于科学计算中几乎所有问题以及无数其他领域都是必不可少的。大多数矩阵计算都具有很高的计算强度(即计算与数据之比),因此用于计算的算法具有提高效率的潜力。但是,许多线性代数算法的性能受到在并行计算机上的处理器之间或单个处理器的整个内存层次结构中移动数据的成本的限制,我们通常将其称为通信。技术趋势表明,未来的通信将进一步限制算法性能。在本文中,我们考虑了稠密线性代数中的基本计算,并提出了以下问题:我们可以根据所需的沟通和在实践中的表现,显着改善当前用于这些计算的算法吗?分析顺序和并行体系结构模型上的算法,这些算法很简单,可以确定粗略的通信成本,但又足够准确,可以预测实际硬件上的实现性能。对于大多数计算,我们证明了任何算法都必须执行的通信下限。如果存在通信成本与下限匹配的算法(至少在渐近意义上),我们将算法称为最优通信。在许多情况下,最常用的算法不是最佳通信,因此我们可以开发需要较少数据移动并达到通信下限的新算法。;在本文中,我们同时开发了新的通信下限和新算法,并在许多情况下缩小)最知名的下限和最知名算法(或上限)之间的差距。我们同时考虑了顺序算法和并行算法,同时评估了经典算法和快速算法(例如Strassen矩阵乘法算法)。特别是,本论文的主要贡献是:证明了几乎所有经典直接线性代数计算的新通信下界(密集或稀疏),包括分解线性系统,最小二乘问题以及特征值和奇异值问题的因式分解;为Strassen和其他快速矩阵乘法算法证明新的通信下界;证明经典和快速计算的新并行通信下界为算法的完美扩展能力设置了限制;总结下限适用的计算的顺序算法和并行算法的通信效率的最新水平;开发一种新的通信最优算法来计算对称不定式分解(与其他共享内存并行算法相比,观察到的加速高达2.8倍);开发新的,通信效率更高的算法,以通过正交相似变换将对称带矩阵简化为对角线形式(与其他顺序和并行算法相比,观察到的加速比是2--6倍);并开发一种新的通信优化Strassen矩阵乘法算法的并行化方法(与替代的分布式内存并行算法相比,观察到的速度提高了2.84倍)。

著录项

  • 作者

    Ballard, Grey Malone.;

  • 作者单位

    University of California, Berkeley.;

  • 授予单位 University of California, Berkeley.;
  • 学科 Computer Science.;Applied Mathematics.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 215 p.
  • 总页数 215
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号