首页> 外文期刊>ACM Transactions on Parallel Computing >A Recursive Algebraic Coloring Technique for Hardware-efficient Symmetric Sparse Matrix-vector Multiplication
【24h】

A Recursive Algebraic Coloring Technique for Hardware-efficient Symmetric Sparse Matrix-vector Multiplication

机译:硬件高效对称稀疏矩阵乘法递归代数着色技术

获取原文
获取原文并翻译 | 示例
       

摘要

The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building block for many numerical linear algebra kernel operations or graph traversal applications. Parallelizing SymmSpMV on today's multicore platforms with up to 100 cores is difficult due to the need to manage conflicting updates on the result vector. Coloring approaches can be used to solve this problem without data duplication, but existing coloring algorithms do not take load balancing and deep memory hierarchies into account, hampering scalability and full-chip performance. In this work, we propose the recursive algebraic coloring engine (RACE), a novel coloring algorithm and open-source library implementation that eliminates the shortcomings of previous coloring methods in terms of hardware efficiency and parallelization overhead. We describe the level construction, distance-k coloring, and load balancing steps in RACE, use it to parallelize SymmSpMV, and compare its performance on 31 sparse matrices with other state-of-the-art coloring techniques and Intel MKL on two modern multicore processors. RACE outperforms all other approaches substantially. By means of a parameterized roofline model, we analyze the SymmSpMV performance in detail and discuss outliers. While we focus on SymmSpMV in this article, our algorithm and software are applicable to any sparse matrix operation with data dependencies that can be resolved by distance-k coloring.
机译:对称稀疏矩阵矢量乘法(Symmspmv)是许多数值线性代数内核操作或图形遍历应用的重要构建块。由于需要管理结果向量的冲突更新,当今最多100个核心的多核平台上并行化Symmspmv很难。着色方法可用于解决此问题而无需数据复制,但现有的着色算法不会考虑负载平衡和深记忆层次结构,妨碍可扩展性和全芯片性能。在这项工作中,我们提出了递归代数着色发动机(RACE),一种新型着色算法和开源库实现,可以在硬件效率和平行化开销方面消除先前着色方法的缺点。我们描述了竞争中的水平施工,距离-K着色和负载平衡步骤,使用它并将其并行化Symmspmv,并在两个现代多核上的其他最先进的着色技术和英特尔MKL上比较了31个稀疏矩阵的性能处理器。比赛超越了所有其他方法。通过参数化的屋顶线型,我们详细分析了Symmspmv性能并讨论了异常值。虽然我们专注于本文中的SymmsPMV,但我们的算法和软件适用于任何稀疏矩阵操作,可以通过距离-K着色来解决数据依赖性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号