...
首页> 外文期刊>Acta Polytechnica >Acceleration Of Sparse Matrix-vector Multiplication By Region Traversal
【24h】

Acceleration Of Sparse Matrix-vector Multiplication By Region Traversal

机译:区域遍历对矩阵向量乘积的加速

获取原文
获取原文并翻译 | 示例

摘要

Sparse matrix-vector multiplication (shortly SpM× V) is one of most common subroutines in numerical linear algebra. The problem is that the memory access patterns during SpM ×V are irregular, and utilization of the cache can suffer from low spatial or temporal locality. Approaches to improve the performance of SpM × V are based on matrix reordering and register blocking. These matrix transformations are designed to handle randomly occurring dense blocks in a sparse matrix. The efficiency of these transformations depends strongly on the presence of suitable blocks. The overhead of reorganization of a matrix from one format to another is often of the order of tens of executions of SpM×V. For this reason, such a reorganization pays off only if the same matrix A is multiplied by multiple different vectors, e.g., in iterative linear solvers. This paper introduces an unusual approach to accelerate SpM×V. This approach can be combined with other acceleration approaches and consists of three steps: 1) dividing matrix A into non-empty regions, 2) choosing an efficient way to traverse these regions (in other words, choosing an efficient ordering of partial multiplications), 3) choosing the optimal type of storage for each region. All these three steps are tightly coupled. The first step divides the whole matrix into smaller parts (regions) that can fit in the cache. The second step improves the locality during multiplication due to better utilization of distant references. The last step maximizes the machine computation performance of the partial multiplication for each region. In this paper, we describe aspects of these 3 steps in more detail (including fast and time-inexpensive algorithms for all steps). Our measurements prove that our approach gives a significant speedup for almost all matrices arising from various technical areas.
机译:稀疏矩阵向量乘法(简称SpM×V)是数值线性代数中最常见的子例程之一。问题在于SpM×V期间的内存访问模式是不规则的,并且高速缓存的利用率可能会受到空间或时间局部性较低的困扰。改善SpM×V性能的方法基于矩阵重排序和寄存器分块。这些矩阵变换被设计为处理稀疏矩阵中随机出现的密集块。这些转换的效率在很大程度上取决于合适块的存在。将矩阵从一种格式重组为另一种格式的开销通常约为SpM×V执行数十次。因此,只有在相同的矩阵A乘以多个不同的向量(例如在迭代线性求解器中)的情况下,这种重组才能奏效。本文介绍了一种不寻常的方法来加速SpM×V。该方法可以与其他加速方法结合使用,包括三个步骤:1)将矩阵A划分为非空区域,2)选择一种遍历这些区域的有效方法(换句话说,选择部分乘法的有效排序), 3)为每个区域选择最佳的存储类型。所有这三个步骤都是紧密耦合的。第一步,将整个矩阵分成可以放入缓存的较小部分(区域)。由于更好地利用了远距离参照,第二步改善了乘法过程中的局部性。最后一步使每个区域的部分乘法的计算机计算性能最大化。在本文中,我们将更详细地描述这三个步骤的各个方面(包括用于所有步骤的快速且省时的算法)。我们的测量结果证明,我们的方法可以显着提高各种技术领域产生的几乎所有矩阵的速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号