Highly scalable parallel algorithms for sparse matrix factorization

Gupta A.; Karypis G.

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Highly scalable parallel algorithms for sparse matrix factorization

【24h】

Highly scalable parallel algorithms for sparse matrix factorization

机译：高度可扩展的并行算法，用于稀疏矩阵分解

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we describe scalable parallel algorithms for symmetric sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1,024 processors on a Gray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithms substantially improve the state of the art in parallel direct solution of sparse linear systems-both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithms to factor a wide class of sparse matrices (including those arising from two- and three-dimensional finite element problems) that are asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithms incur less communication overhead and are more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of one of our sparse Cholesky factorization algorithms delivers up to 20 GFlops on a Gray T3D for medium-size structural engineering and linear programming problems. To the best of our knowledge, this is the highest performance ever obtained for sparse Cholesky factorization on any supercomputer.

机译：在本文中，我们描述了用于对称稀疏矩阵分解的可伸缩并行算法，分析了它们的性能和可伸缩性，并在Gray T3D并行计算机上提供了多达1,024个处理器的实验结果。通过我们的分析和实验结果，我们证明了在稀疏线性系统的并行直接解决方案中，我们的算法在可伸缩性和总体性能方面都大大改善了现有技术。众所周知的事实是，密矩阵分解很好地缩放，并且可以在并行计算机上有效地实现。在本文中，我们提出了第一个算法来分解各种稀疏矩阵（包括那些由二维和三维有限元问题引起的稀疏矩阵），这些稀疏矩阵在各种并行体系结构上都可以像密矩阵分解算法那样渐近地扩展。与稀疏矩阵分解的任何先前已知并行公式相比，我们的算法产生的通信开销更少，并且可伸缩性更高。尽管在本文中，我们讨论了对称正定矩阵的Cholesky分解，但该算法可适用于解决稀疏线性最小二乘问题和结构几乎对称的对角占优矩阵的高斯消除。我们的一种稀疏的Cholesky因式分解算法的实现可在Gray T3D上提供多达20个GFlop，用于中等大小的结构工程和线性编程问题。据我们所知，这是任何超级计算机上稀疏的Cholesky因式分解所获得的最高性能。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |1997年第5期|P.502-520|共19页
作者
Gupta A.; Karypis G.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Parallel sparse matrix-matrix multiplication: a scalable solution with 1D algorithm [J] . Mohammad Asadul Hoque, Rezaul Karim Raju, Christopher John Tymczak, International Journal of Computational Science and Engineering . 2015,第4期

机译：并行稀疏矩阵-矩阵乘法：具有一维算法的可扩展解决方案
2. MODYLAS: A Highly Parallelized General-Purpose Molecular Dynamics Simulation Program for Large-Scale Systems with Long-Range Forces Calculated by Fast Multipole Method (FMM) and Highly Scalable Fine-Grained New Parallel Processing Algorithms [J] . Yoshimichi Andoh, Noriyuki Yoshii, Kazushi Fujimoto Journal of chemical theory and computation: JCTC . 2013,第7期

机译：MODYLAS：具有并行力的大型多用途通用分子动力学仿真程序，该程序由快速多极方法（FMM）和高度可扩展的细粒度新并行处理算法计算而得
3. Improved Symmetric and Nonnegative Matrix Factorization Models for Undirected, Sparse and Large-Scaled Networks: A Triple Factorization-Based Approach [J] . Song Yan, Li Ming, Luo Xin, IEEE transactions on industrial informatics . 2020,第5期

机译：改进对称和非负矩阵分解模型，用于无向，稀疏和大缩放的网络：基于三重分子化的方法
4. An Efficient Parallelization Approach for Large-Scale Sparse Non-Negative Matrix Factorization Using Kullback-Leibler Divergence on Multi-GPU [C] . Hao Li, Kenli Li, Jiwu Peng, 15th IEEE International Symposium on Parallel and Distributed Processing with Applications and 16th IEEE International Conference on Ubiquitous Computing and Communications . 2017

机译：多GPU上使用Kullback-Leibler发散的大规模稀疏非负矩阵分解的有效并行化方法
5. Performance Optimization for Sparse Matrix Factorization Algorithms on Hybrid Multicore Architectures [D] . Tang, Meng. 2020

机译：混合多核架构上稀疏矩阵分解算法的性能优化
6. Decoding the Encoding of Functional Brain Networks: an fMRI Classification Comparison of Non-negative Matrix Factorization (NMF) Independent Component Analysis (ICA) and Sparse Coding Algorithms [O] . Jianwen Xie, Pamela K. Douglas, Ying Nian Wu, -1

机译：解码功能性大脑网络的编码：非负矩阵分解（NMF）独立成分分析（ICA）和稀疏编码算法的fMRI分类比较
7. Highly scalable parallel algorithms for sparse matrix factorization [O] . Anshul Gupta, George Karypis, Vipin Kumar 1994

机译：用于稀疏矩阵分解的高度可扩展的并行算法

Highly scalable parallel algorithms for sparse matrix factorization

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅