Avoiding Communication in Dense Linear Algebra.

机译：在密集线性代数中避免通信。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Dense linear algebra computations are essential to nearly every problem in scientific computing and to countless other fields. Most matrix computations enjoy a high computational intensity (i.e., ratio of computation to data), and therefore the algorithms for the computations have a potential for high efficiency. However, performance for many linear algebra algorithms is limited by the cost of moving data between processors on a parallel computer or throughout the memory hierarchy of a single processor, which we will refer to generally as communication. Technological trends indicate that algorithmic performance will become even more limited by communication in the future. In this thesis, we consider the fundamental computations within dense linear algebra and address the following question: can we significantly improve the current algorithms for these computations, in terms of the communication they require and their performance in practice?;To answer the question, we analyze algorithms on sequential and parallel architectural models that are simple enough to determine coarse communication costs but accurate enough to predict performance of implementations on real hardware. For most of the computations, we prove lower bounds on the communication that any algorithm must perform. If an algorithm exists with communication costs that match the lower bounds (at least in an asymptotic sense), we call the algorithm communication optimal. In many cases, the most commonly used algorithms are not communication optimal, and we can develop new algorithms that require less data movement and attain the communication lower bounds.;In this thesis, we develop both new communication lower bounds and new algorithms, tightening (and in many cases closing) the gap between best known lower bound and best known algorithm (or upper bound). We consider both sequential and parallel algorithms, and we asses both classical and fast algorithms (e.g., Strassen's matrix multiplication algorithm).;In particular, the central contributions of this thesis are: proving new communication lower bounds for nearly all classical direct linear algebra computations (dense or sparse), including factorizations for solving linear systems, least squares problems, and eigenvalue and singular value problems; proving new communication lower bounds for Strassen's and other fast matrix multiplication algorithms; proving new parallel communication lower bounds for classical and fast computations that set limits on an algorithm's ability to perfectly strong scale; summarizing the state-of-the-art in communication efficiency for both sequential and parallel algorithms for the computations to which the lower bounds apply; developing a new communication-optimal algorithm for computing a symmetric-indefinite factorization (observing speedups of up to 2.8x compared to alternative shared-memory parallel algorithms); developing new, more communication-efficient algorithms for reducing a symmetric band matrix to tridiagonal form via orthogonal similar transformations (observing speedups of 2--6x compared to alternative sequential and parallel algorithms); and developing a new communication-optimal parallelization of Strassen's matrix multiplication algorithm (observing speedups of up to 2.84x compared to alternative distributed-memory parallel algorithms).

机译：密集的线性代数计算对于科学计算中几乎所有问题以及无数其他领域都是必不可少的。大多数矩阵计算都具有很高的计算强度（即计算与数据之比），因此用于计算的算法具有提高效率的潜力。但是，许多线性代数算法的性能受到在并行计算机上的处理器之间或单个处理器的整个内存层次结构中移动数据的成本的限制，我们通常将其称为通信。技术趋势表明，未来的通信将进一步限制算法性能。在本文中，我们考虑了稠密线性代数中的基本计算，并提出了以下问题：我们可以根据所需的沟通和在实践中的表现，显着改善当前用于这些计算的算法吗？分析顺序和并行体系结构模型上的算法，这些算法很简单，可以确定粗略的通信成本，但又足够准确，可以预测实际硬件上的实现性能。对于大多数计算，我们证明了任何算法都必须执行的通信下限。如果存在通信成本与下限匹配的算法（至少在渐近意义上），我们将算法称为最优通信。在许多情况下，最常用的算法不是最佳通信，因此我们可以开发需要较少数据移动并达到通信下限的新算法。；在本文中，我们同时开发了新的通信下限和新算法，并在许多情况下缩小）最知名的下限和最知名算法（或上限）之间的差距。我们同时考虑了顺序算法和并行算法，同时评估了经典算法和快速算法（例如Strassen矩阵乘法算法）。特别是，本论文的主要贡献是：证明了几乎所有经典直接线性代数计算的新通信下界（密集或稀疏），包括分解线性系统，最小二乘问题以及特征值和奇异值问题的因式分解；为Strassen和其他快速矩阵乘法算法证明新的通信下界；证明经典和快速计算的新并行通信下界为算法的完美扩展能力设置了限制；总结下限适用的计算的顺序算法和并行算法的通信效率的最新水平；开发一种新的通信最优算法来计算对称不定式分解（与其他共享内存并行算法相比，观察到的加速高达2.8倍）；开发新的，通信效率更高的算法，以通过正交相似变换将对称带矩阵简化为对角线形式（与其他顺序和并行算法相比，观察到的加速比是2--6倍）；并开发一种新的通信优化Strassen矩阵乘法算法的并行化方法（与替代的分布式内存并行算法相比，观察到的速度提高了2.84倍）。

著录项

作者
Ballard, Grey Malone.;
展开▼
作者单位

University of California, Berkeley.;

展开▼
授予单位 University of California, Berkeley.;
学科 Computer Science.;Applied Mathematics.
学位 Ph.D.
年度 2013
页码 215 p.
总页数 215
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A Class of Communication-avoiding Algorithms for Solving General Dense Linear Systems on CPU/GPU Parallel Machines [J] . Marc Baboulin, Simplice Donfack, Jack Dongarra, Procedia Computer Science . 2012,第1期

机译：一类用于避免CPU / GPU并行机上通用密集线性系统的通信避免算法
2. Estimation of a high-dense crowd based on a Balanced Communication-Avoiding Support Vector Machine classifier [J] . Shaya A. Alshaya International journal of computer science and network security . 2020,第6期

机译：基于平衡通信支持向量机分类器的高密度人群估算
3. A preconditioned numerical solver for stiff nonlinear reaction-diffusion equations with fractional Laplacians that avoids dense matrices [J] . Simmons Alex, Yang Qianqian, Moroney Timothy Journal of Computational Physics . 2015,第Null期

机译：带分数拉普拉斯算子的刚性非线性反应扩散方程的避免数值矩阵的预处理数值求解器
4. A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines [C] . Marc Baboulin, Simplice Donfack, Jack Dongarra, International Conference on Computational Science . 2013

机译：一类通信避免用于解决CPU / GPU平行机上一般密集线性系统的避免算法
5. The communication complexity of several problems in numerical linear algebra. [D] . Chu, Jeff I-Shang. 1988

机译：数值线性代数中几个问题的通信复杂度。
6. Wideband LTE Power Amplifier with Integrated Novel Analog Pre-Distorter Linearizer for Mobile Wireless Communications [O] . Eswaran Uthirajoo, Harikrishnan Ramiah, Jeevan Kanesan, -1

机译：集成新型模拟预失真线性化器的宽带LTE功率放大器用于移动无线通信
7. A class of communication-avoiding algorithms for solving general dense linear systems on CPU/GPU parallel machines [O] . Baboulin, Marc, Donfack, Simplice, Dongarra, Jack, 2012

机译：用于在CPU / GPU并行机上求解通用密集线性系统的一类避免通信的算法
8. Avoiding Communication in Dense Linear Algebra [R] . Ballard, G M 2013

机译：避免密集线性代数中的通信

Avoiding Communication in Dense Linear Algebra.

摘要

著录项

相似文献

相关主题

期刊订阅