A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices

机译：稀疏矩阵的避免通信的3D LU分解算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D sparse LU algorithm uses a three-dimensional MPI process grid, aggressively exploits elimination tree parallelism and trades off increased memory for reduced per-process communication. We also analyze the asymptotic improvements for planar graphs (e.g., from 2D grid or mesh domains) and certain non-planar graphs (specifically for 3D grids and meshes). For planar graphs with n vertices, our algorithm reduces communication volume asymptotically in n by a factor of O{log n} and latency by a factor of O{log n}. For non-planar cases, our algorithm can reduce the per-process communication volume by 3× and latency by O{n^1/3} times. In all cases, the memory needed to achieve these gains is a constant factor. We implemented our algorithm by extending the 2D data structure used in superLU. Our new 3D code achieves speedups up to 27× for planar graphs and up to 3.3× for non-planar graphs over the baseline 2D superLU when run on 24,000 cores of a Cray XC30.

机译：我们提出了一种新的算法，以提高分布式存储系统上右稀疏LU分解的强大可伸缩性。我们的3D稀疏LU算法使用三维MPI过程网格，积极利用消除树并行性并权衡增加的内存以减少每进程的通信。我们还分析了平面图（例如来自2D网格或网格域）和某些非平面图（特别是3D网格和网格）的渐近改进。对于具有n个顶点的平面图，我们的算法将n的通信量渐近地减少了O {log n}，而等待时间则减少了O {log n}。对于非平面情况，我们的算法可以将每进程的通信量减少3倍，而延迟则减少O {n ^ 1/3}倍。在所有情况下，实现这些增益所需的存储器都是一个恒定因素。我们通过扩展superLU中使用的2D数据结构来实现我们的算法。当在Cray XC30的24,000个内核上运行时，我们的新3D代码在基准2D superLU上可将平面图的速度提高27倍，将非平面图的速度提高3.3倍。

著录项

来源
《IEEE International Parallel and Distributed Processing Symposium》|2018年|908-919|共12页
会议地点
作者
Piyush Sao; Xiaoye Sherry Li; Richard Vuduc;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Sparse matrices; Two dimensional displays; Three-dimensional displays; Particle separators; Parallel processing; Transmission line matrix methods; Matrices;

机译：稀疏矩阵;二维显示;三维显示;粒子分隔符;并行处理;传输线矩阵法;矩阵;

相似文献

外文文献
中文文献
专利

1. A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems [J] . Sao Piyush, Li Xiaoye S., Vuduc Richard Journal of Parallel and Distributed Computing . 2019,第SEPa期

机译：异构系统上用于稀疏LU分解的避免通信的3D算法
2. A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems [J] . Sao Piyush, Li Xiaoye S., Vuduc Richard Journal of Parallel and Distributed Computing . 2019,第Sepa期

机译：异构系统稀疏LU分解的通信3D算法
3. Parallel LU factorization of sparse matrices on FPGA-based configurable computing engines [J] . Xiaofang Wang, Sotirios G. Ziavras Concurrency and Computation . 2004,第4期

机译：基于FPGA的可配置计算引擎上稀疏矩阵的并行LU分解
4. A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices [C] . Piyush Sao, Xiaoye Sherry Li, Richard Vuduc IEEE International Parallel and Distributed Processing Symposium . 2018

机译：一种避免稀疏矩阵的3D LU分解算法
5. Stable Sparse Orthogonal Factorization of Ill-Conditioned Banded Matrices for Parallel Computing [D] . Huang, Qian. 2017

机译：并行计算的病态带状矩阵的稳定稀疏正交分解
6. Decoding the Encoding of Functional Brain Networks: an fMRI Classification Comparison of Non-negative Matrix Factorization (NMF) Independent Component Analysis (ICA) and Sparse Coding Algorithms [O] . Jianwen Xie, Pamela K. Douglas, Ying Nian Wu, -1

机译：解码功能性大脑网络的编码：非负矩阵分解（NMF）独立成分分析（ICA）和稀疏编码算法的fMRI分类比较
7. Reducing elimination tree height for parallel LU factorization of sparse unsymmetric matrices [O] . Enver Kayaaslan, Bora Ucar 2014

机译：减少稀疏非对称矩阵平行LU分解的消除树高
8. LU Factorization of Sequences of Identically Structured Sparse Matrices Within a Distributed Memory Environment. [R] . Hadfield, S. M. 1994

机译：分布式存储环境中相同结构稀疏矩阵序列的LU分解。

A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices

摘要

著录项

相似文献

相关主题

期刊订阅