首页> 外文期刊>Parallel Computing >Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems
【24h】

Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems

机译:在大规模并行系统上提高稀疏矩阵稠密矩阵乘法的性能

获取原文
获取原文并翻译 | 示例

摘要

We propose a comprehensive and generic framework to minimize multiple and different volume-based communication cost metrics for sparse matrix dense matrix multiplication (SpMM). SpMM is an important kernel that finds application in computational linear algebra and big data analytics. On distributed memory systems, this kernel is usually characterized with its high communication volume requirements. Our approach targets irregularly sparse matrices and is based on both graph and hypergraph partitioning models that rely on the widely adopted recursive bipartitioning paradigm. The proposed models are lightweight, portable (can be realized using any graph and hypergraph partitioning tool) and can simultaneously optimize different cost metrics besides total volume, such as maximum send receive volume, maximum sum of send and receive volumes, etc., in a single partitioning phase. They allow one to define and optimize as many custom volume-based metrics as desired through a flexible formulation. The experiments on a wide range of about thousand matrices show that the proposed models drastically reduce the maximum communication volume compared to the standard partitioning models that only address the minimization of total volume. The improvements obtained on volume-based partition quality metrics using our models are validated with parallel SpMM as well as parallel multi-source BFS experiments on two large-scale systems. For parallel SpMM, compared to the standard partitioning models, our graph and hypergraph partitioning models respectively achieve reductions of 14% and 22% in runtime, on average. Compared to the state-of-the-art partitioner UMPa, our graph model is overall 14.5x faster and achieves an average improvement of 19% in the partition quality on instances that are bounded by maximum volume. For parallel BFS, we show on graphs with more than a billion edges that the scalability can significantly be improved with our models compared to a recently proposed two-dimensional partitioning model. (C) 2016 Elsevier B.V. All rights reserved.
机译:我们提出了一个全面的通用框架,以最小化稀疏矩阵密集矩阵乘法(SpMM)的多个和不同的基于体积的通信成本度量。 SpMM是重要的内核,可在计算线性代数和大数据分析中找到应用。在分布式存储系统上,此内核通常具有很高的通信量要求。我们的方法针对不规则稀疏矩阵,并且基于图和超图分区模型,这些模型依赖于广泛采用的递归二分区范例。所提出的模型是轻量级的,可移植的(可以使用任何图形和超图分区工具来实现),并且可以同时优化除总体积外的其他成本指标,例如最大发送接收量,最大发送和接收量之和等。单个分区阶段。它们允许通过灵活的公式定义和优化所需数量的自定义基于数量的指标。在大约数千个矩阵的广泛范围内进行的实验表明,与仅解决总体积最小化的标准分区模型相比,该模型大大降低了最大通信量。使用我们的模型在基于卷的分区质量指标上获得的改进已通过并行SpMM以及两个大型系统上的并行多源BFS实验进行了验证。对于并行SpMM,与标准分区模型相比,我们的图和超图分区模型平均在运行时分别减少了14%和22%。与最新的分区器UMPa相比,我们的图形模型总体上提高了14.5倍,在以最大体积为边界的实例上,分区质量平均提高了19%。对于并行BFS,我们在具有十亿多条边的图形上显示,与最近提出的二维分区模型相比,使用我们的模型可以显着改善可伸缩性。 (C)2016 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号