首页> 外文会议>International conference on high performance computing >Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms
【24h】

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

机译:多核平台上的并行高效稀疏矩阵-矩阵乘法

获取原文

摘要

Sparse matrix-matrix multiplication (SpGEMM) is a key kernel in many applications in High Performance Computing such as algebraic multigrid solvers and graph analytics. Optimizing SpGEMM on modern processors is challenging due to random data accesses, poor data locality and load imbalance during computation. In this work, we investigate different partitioning techniques, cache optimizations (using dense arrays instead of hash tables), and dynamic load balancing on SpGEMM using a diverse set of real-world and synthetic datasets. We demonstrate that our implementation outperforms the state-of-the-art using Intel~® Xeon~® processors. We are up to 3.8X faster than Intel~® Math Kernel Library (MKL) and up to 257X faster than CombBLAS. We also outperform the best published GPU implementation of SpGEMM on nVidia GTX Titan and on AMD Radeon HD 7970 by up to 7.3X and 4.5X, respectively on their published datasets. We demonstrate good multi-core scalability (geomean speedup of 18.2X using 28 threads) as compared to MKL which gets 7.5X scaling on 28 threads.
机译:稀疏矩阵矩阵乘法(SpGEMM)是高性能计算的许多应用程序中的关键内核,例如代数多网格求解器和图形分析。由于随机数据访问,较差的数据局部性以及计算过程中的负载不平衡,因此在现代处理器上优化SpGEMM面临挑战。在这项工作中,我们研究了不同的分区技术,缓存优化(使用密集数组而不是哈希表)以及使用各种实际数据集和合成数据集对SpGEMM进行的动态负载平衡。我们证明了我们的实施优于使用Intel®Xeon〜®处理器的最新技术。我们的速度比Intel〜®Math Kernel Library(MKL)快3.8倍,比CombBLAS快257倍。在nVidia GTX Titan和AMD Radeon HD 7970上,我们在其已发布的数据集上的性能也比最佳发布的SpGEMM GPU分别高出7.3倍和4.5倍。与MKL相比,我们展示了良好的多核可扩展性(使用28个线程的Geomean加速了18.2倍),而MKL在28个线程上获得了7.5倍的扩展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号