首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium >Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale

【24h】

Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale

机译：避免避免和内存受限的稀疏矩阵矩阵乘法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a distributed-memory algorithm for sparse matrix-matrix multiplication (SpGEMM) of extremely large matrices where the generated output is larger than the aggregated memory of a target supercomputer. We address this challenge by splitting the computation into batches with each batch generating a set of output columns. We developed a distributed symbolic step to understand the memory requirement and determine the number of batches beforehand. We integrated the multiplication in each batch with an existing communication avoiding techniques to reduce the communication overhead while multiplying matrices in a 3-D process grid. Furthermore, we made the in-node computations faster by designing a sort-free SpGEMM and merging algorithm. Incorporating all the proposed approaches, our SpGEMM scales for large protein-similarity networks using 262,144 cores on a Cray XC40 supercomputer while achieving a 10x speedup using 16x more nodes. Our code is available as part of the Combinatorial BLAS library (https://github.com/PASSIONLab/CombBLAS).

机译：我们介绍了一种用于极大矩阵的稀疏矩阵 - 矩阵乘法（SPGEMM）的分布式存储算法，其中生成的输出大于目标超级计算机的聚合存储器。通过将计算分为批次，通过生成一组输出列来解决这一挑战，通过生成一组输出列来解决这一挑战。我们开发了一个分布式符号步骤，以了解内存要求并事先确定批次数量。我们在每批中集成乘法，具有现有通信避免技术来减少通信开销，同时将矩阵乘以3-D处理网格。此外，我们通过设计无痕SPGEMM和合并算法来更快地使内部计算更快。通过在CRAY XC40超级计算机上使用262,144核心使用262,144个核心，在CRAY XC40超级计算机上使用262,144核来融合所有提出的方法，同时使用16倍的节点来实现10倍的加速。我们的代码可作为组合Blas库的一部分提供（https://github.com/passionlab/combblas）。

著录项

来源
《IEEE International Parallel and Distributed Processing Symposium 》|2021年|90-100|共11页
会议地点
作者
Md Taufique Hussain; Oguz Selvitopi; Aydin Buluç; Ariful Azad;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Proteins; Three-dimensional displays; Social networking (online); Scientific computing; Memory management; Genomics; Parallel processing;

机译：蛋白质;三维显示器;社交网络（在线）;科学计算;记忆管理;基因组学;并行处理;

相似文献

外文文献
中文文献
专利

1. Scaling sparse matrix-matrix multiplication in the accumulo database [J] . Demirci Gunduz Vehbi, Aykanat Cevdet Distributed and Parallel Databases . 2020 ,第1期

机译：在累积数据库中缩放稀疏矩阵-矩阵乘法
2. Scaling sparse matrix-matrix multiplication in the accumulo database [J] . Demirci Gunduz Vehbi, Aykanat Cevdet Ecological restoration . 2020 ,第1期

机译：缩放稀疏矩阵矩阵乘法在累计数据库中
3. Parallel sparse matrix-matrix multiplication: a scalable solution with 1D algorithm [J] . Mohammad Asadul Hoque, Rezaul Karim Raju, Christopher John Tymczak, International Journal of Computational Science and Engineering . 2015 ,第4期

机译：并行稀疏矩阵-矩阵乘法：具有一维算法的可扩展解决方案
4. Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication [C] . Penporn Koanantakool, Ariful Azad, Aydin Buluç, IEEE International Parallel and Distributed Processing Symposium . 2016

机译：避免通信的并行稀疏-密集矩阵-矩阵乘法
5. Efficient, scalable, parallel, matrix-matrix multiplication [D] . Portillo, Enrique 2013

机译：高效，可扩展，并行，矩阵矩阵乘法
6. A unified framework for sparse non-negative least squares using multiplicative updates and the non-negative matrix factorization problem [O] . Igor Fedorov, Alican Nalci, Ritwik Giri, -1

机译：使用乘法更新和非负矩阵分解问题的稀疏非负最小二乘的统一框架
7. Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale [O] . Md Taufique Hussain, Oguz Selvitopi, Aydin Buluc, 2021

机译：避免通信和内存受限稀疏矩阵矩阵矩阵

Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale

摘要

著录项

相似文献

相关主题

期刊订阅