Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

机译：多核平台上的并行高效稀疏矩阵-矩阵乘法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sparse matrix-matrix multiplication (SpGEMM) is a key kernel in many applications in High Performance Computing such as algebraic multigrid solvers and graph analytics. Optimizing SpGEMM on modern processors is challenging due to random data accesses, poor data locality and load imbalance during computation. In this work, we investigate different partitioning techniques, cache optimizations (using dense arrays instead of hash tables), and dynamic load balancing on SpGEMM using a diverse set of real-world and synthetic datasets. We demonstrate that our implementation outperforms the state-of-the-art using Intel~® Xeon~® processors. We are up to 3.8X faster than Intel~® Math Kernel Library (MKL) and up to 257X faster than CombBLAS. We also outperform the best published GPU implementation of SpGEMM on nVidia GTX Titan and on AMD Radeon HD 7970 by up to 7.3X and 4.5X, respectively on their published datasets. We demonstrate good multi-core scalability (geomean speedup of 18.2X using 28 threads) as compared to MKL which gets 7.5X scaling on 28 threads.

机译：稀疏矩阵矩阵乘法（SpGEMM）是高性能计算的许多应用程序中的关键内核，例如代数多网格求解器和图形分析。由于随机数据访问，较差的数据局部性以及计算过程中的负载不平衡，因此在现代处理器上优化SpGEMM面临挑战。在这项工作中，我们研究了不同的分区技术，缓存优化（使用密集数组而不是哈希表）以及使用各种实际数据集和合成数据集对SpGEMM进行的动态负载平衡。我们证明了我们的实施优于使用Intel®Xeon〜®处理器的最新技术。我们的速度比Intel〜®Math Kernel Library（MKL）快3.8倍，比CombBLAS快257倍。在nVidia GTX Titan和AMD Radeon HD 7970上，我们在其已发布的数据集上的性能也比最佳发布的SpGEMM GPU分别高出7.3倍和4.5倍。与MKL相比，我们展示了良好的多核可扩展性（使用28个线程的Geomean加速了18.2倍），而MKL在28个线程上获得了7.5倍的扩展。

著录项

来源
《International conference on high performance computing》|2015年|48-57|共10页
会议地点
作者
Mostofa Ali Patwary; Nadathur Rajagopalan Satish; Narayanan Sundaram; Jongsoo Park; Michael J. Anderson; Satya Gautam Vadlamudi; Dipankar Das; Sergey G. Pudov; Vadim O. Pirogov; Pradeep Dubey;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication [J] . Liu Junhong, He Xin, Liu Weifeng, International journal of parallel programming . 2019,第3期

机译：并行稀疏矩阵-矩阵乘法的寄存器感知优化
2. Reducing inter-process communication overhead in parallel sparse matrix-matrix multiplication [J] . Ahmed Md Salman, Houser Jennifer, Hoque Mohammad Asadul, International journal of grid and high performance computing . 2017,第3期

机译：减少并行稀疏矩阵-矩阵乘法的进程间通信开销
3. Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model [J] . Rubensson Emanuel H., Rudberg Elias Parallel Computing . 2016,第SEPa期

机译：使用块和任务编程模型的局部性并行块稀疏矩阵矩阵乘法
4. Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms [C] . Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, ISC High Performance Conference . 2015

机译：多核平台上并行高效稀疏矩阵矩阵乘法
5. Efficient, scalable, parallel, matrix-matrix multiplication [D] . Portillo, Enrique 2013

机译：高效，可扩展，并行，矩阵矩阵乘法
6. A parallel and sensitive software tool for methylation analysis on multicore platforms [O] . Joaquín Tárraga, Mariano Pérez, Juan M. Orduña, -1

机译：用于多核平台上甲基化分析的并行且敏感的软件工具
7. Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format [O] . Shaohuai Shi, Qiang Wang, Xiaowen Chu 2020

机译：使用定制稀疏存储格式的高效稀疏密集矩阵矩阵乘法

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

摘要

著录项

相似文献

相关主题

期刊订阅