Acceleration Of Sparse Matrix-vector Multiplication By Region Traversal

I. Simecek

首页> 外文期刊>Acta Polytechnica >Acceleration Of Sparse Matrix-vector Multiplication By Region Traversal

【24h】

Acceleration Of Sparse Matrix-vector Multiplication By Region Traversal

机译：区域遍历对矩阵向量乘积的加速

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sparse matrix-vector multiplication (shortly SpM× V) is one of most common subroutines in numerical linear algebra. The problem is that the memory access patterns during SpM ×V are irregular, and utilization of the cache can suffer from low spatial or temporal locality. Approaches to improve the performance of SpM × V are based on matrix reordering and register blocking. These matrix transformations are designed to handle randomly occurring dense blocks in a sparse matrix. The efficiency of these transformations depends strongly on the presence of suitable blocks. The overhead of reorganization of a matrix from one format to another is often of the order of tens of executions of SpM×V. For this reason, such a reorganization pays off only if the same matrix A is multiplied by multiple different vectors, e.g., in iterative linear solvers. This paper introduces an unusual approach to accelerate SpM×V. This approach can be combined with other acceleration approaches and consists of three steps: 1) dividing matrix A into non-empty regions, 2) choosing an efficient way to traverse these regions (in other words, choosing an efficient ordering of partial multiplications), 3) choosing the optimal type of storage for each region. All these three steps are tightly coupled. The first step divides the whole matrix into smaller parts (regions) that can fit in the cache. The second step improves the locality during multiplication due to better utilization of distant references. The last step maximizes the machine computation performance of the partial multiplication for each region. In this paper, we describe aspects of these 3 steps in more detail (including fast and time-inexpensive algorithms for all steps). Our measurements prove that our approach gives a significant speedup for almost all matrices arising from various technical areas.

机译：稀疏矩阵向量乘法（简称SpM×V）是数值线性代数中最常见的子例程之一。问题在于SpM×V期间的内存访问模式是不规则的，并且高速缓存的利用率可能会受到空间或时间局部性较低的困扰。改善SpM×V性能的方法基于矩阵重排序和寄存器分块。这些矩阵变换被设计为处理稀疏矩阵中随机出现的密集块。这些转换的效率在很大程度上取决于合适块的存在。将矩阵从一种格式重组为另一种格式的开销通常约为SpM×V执行数十次。因此，只有在相同的矩阵A乘以多个不同的向量（例如在迭代线性求解器中）的情况下，这种重组才能奏效。本文介绍了一种不寻常的方法来加速SpM×V。该方法可以与其他加速方法结合使用，包括三个步骤：1）将矩阵A划分为非空区域，2）选择一种遍历这些区域的有效方法（换句话说，选择部分乘法的有效排序）， 3）为每个区域选择最佳的存储类型。所有这三个步骤都是紧密耦合的。第一步，将整个矩阵分成可以放入缓存的较小部分（区域）。由于更好地利用了远距离参照，第二步改善了乘法过程中的局部性。最后一步使每个区域的部分乘法的计算机计算性能最大化。在本文中，我们将更详细地描述这三个步骤的各个方面（包括用于所有步骤的快速且省时的算法）。我们的测量结果证明，我们的方法可以显着提高各种技术领域产生的几乎所有矩阵的速度。

著录项

来源
《Acta Polytechnica 》 |2008年第4期| p.8-15| 共8页
作者
I. Simecek;
展开▼
作者单位

Department of Computer Science Czech Technical University in Prague Faculty of Electrical Engineering Technicka 2 166 27 Prague 6, Czech Republic;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自然科学总论 ;
关键词
cache hierarchy; sparse matrix-vector multiplication; region traversal;

机译：高速缓存层次结构;稀疏矩阵向量乘法;区域遍历;

相似文献

外文文献
中文文献
专利

1. Acceleration of Sparse Matrix-Vector Multiplication by Region Traversal [J] . I. ?ime?ek Acta polytechnica . 2008 ,第4期

机译：区域遍历的稀疏矩阵向量乘法加速
2. GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication [J] . Yuan Tao, Yangdong Deng, Shuai Mu, Concurrency and computation: practice and experience . 2015 ,第14期

机译：GPU加速的稀疏矩阵-向量乘法和稀疏矩阵-转置向量乘法
3. LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows [J] . Liu Yongchao, Schmidt Bertil Journal of signal processing systems for signal, image, and video technology . 2018 ,第1期

机译：LightSpMV：使用压缩的稀疏行更快的CUDA兼容稀疏矩阵矢量乘法
4. FPGA acceleration of Sparse Matrix-Vector Multiplication based on Network-on-Chip [C] . Jheng H.Y., Sun C.C., Ruan S.J., European Signal Processing Conference . 2011

机译：基于片上网络的稀疏矩阵-矢量乘法的FPGA加速
5. Analysis of High Performance Sparse Matrix-Vector Multiplication for Small Finite Fields [D] . Lambert, Matthew A. 2020

机译：小型有限字段高性能稀疏矩阵矢量乘法分析
6. HIERARCHICAL ORTHOGONAL MATRIX GENERATION AND MATRIX-VECTOR MULTIPLICATIONS IN RIGID BODY SIMULATIONS [O] . FUHUI FANG, JINGFANG HUANG, GARY HUBER, -1

机译：刚体模拟中的正交正交矩阵生成和矩阵向量乘法
7. FPGA Acceleration of Sparse Matrix-Vector Multiplication Based on Network-on-Chip [O] . Götze Jürgen, Jheng Hong-Yuan, Ruan Shanq-Jang, 2011

机译：基于片上网络的稀疏矩阵向量乘法的FPGA加速

Acceleration Of Sparse Matrix-vector Multiplication By Region Traversal

摘要

著录项

相似文献

相关主题

期刊订阅