首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium >Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale
【24h】

Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale

机译:避免避免和内存受限的稀疏矩阵矩阵乘法

获取原文

摘要

We present a distributed-memory algorithm for sparse matrix-matrix multiplication (SpGEMM) of extremely large matrices where the generated output is larger than the aggregated memory of a target supercomputer. We address this challenge by splitting the computation into batches with each batch generating a set of output columns. We developed a distributed symbolic step to understand the memory requirement and determine the number of batches beforehand. We integrated the multiplication in each batch with an existing communication avoiding techniques to reduce the communication overhead while multiplying matrices in a 3-D process grid. Furthermore, we made the in-node computations faster by designing a sort-free SpGEMM and merging algorithm. Incorporating all the proposed approaches, our SpGEMM scales for large protein-similarity networks using 262,144 cores on a Cray XC40 supercomputer while achieving a 10x speedup using 16x more nodes. Our code is available as part of the Combinatorial BLAS library (https://github.com/PASSIONLab/CombBLAS).
机译:我们介绍了一种用于极大矩阵的稀疏矩阵 - 矩阵乘法(SPGEMM)的分布式存储算法,其中生成的输出大于目标超级计算机的聚合存储器。通过将计算分为批次,通过生成一组输出列来解决这一挑战,通过生成一组输出列来解决这一挑战。我们开发了一个分布式符号步骤,以了解内存要求并事先确定批次数量。我们在每批中集成乘法,具有现有通信避免技术来减少通信开销,同时将矩阵乘以3-D处理网格。此外,我们通过设计无痕SPGEMM和合并算法来更快地使内部计算更快。通过在CRAY XC40超级计算机上使用262,144核心使用262,144个核心,在CRAY XC40超级计算机上使用262,144核来融合所有提出的方法,同时使用16倍的节点来实现10倍的加速。我们的代码可作为组合Blas库的一部分提供(https://github.com/passionlab/combblas)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号