AdELL: An Adaptive Warp-Balancing ELL Format for Efficient Sparse Matrix-Vector Multiplication on GPUs

机译：AdELL：一种适用于GPU的高效稀疏矩阵矢量乘法的自适应Warp-balancing ELL格式

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The sparse matrix-vector multiplication (SpMV) is a fundamental computational kernel used in science and engineering. As a result, the performance of a large number of applications depends on the efficiency of the SpMV. This kernel is, in fact, a bandwidth-limited operation and poses a challenge for optimization when the matrix has an irregular structure. The literature on implementing SpMV on throughput-oriented many core processors is extensive and mostly focuses on matrix formats, proposing different ideas to adapt matrix sparsity to the underlying architecture. In this paper, we propose a novel ELL-based matrix format called Adaptive ELL (AdELL) to improve the state-of-the-art of the SpMV on Graphic Processing Units (GPUs). The AdELL format is based on the idea of distributing working threads to rows according to their computational load, creating balanced hardware-level blocks (warps) that take full advantage of the vectorized execution on Streaming Multiprocessors (SMs). The AdELL data structure is created using a novel warp-balancing heuristic designed to smooth the workload among warps without the need of tuning any parameters. AdELL provides an efficient warp-level synchronization (as opposed to block-level) but can also use atomic operations to distribute very skewed rows over multiple warps. Moreover, we introduce a loop unrolling heuristic that optimizes the SpMV performance by selecting the best unrolling factor based on the warp workload. We tested the proposed AdELL sparse format on a set of conventional benchmarks from heterogeneous application domains. The results show substantial and consistent performance improvements for double-precision calculations, outperforming the state-of-the-art ensemble framework clSpMV. We could observe speedup peaks up to 1.94 and a 25% (geometric) average improvement, which can be potentially increased to 43% introducing a simple 1x2 blocking strategy.

机译：稀疏矩阵矢量乘法（SpMV）是科学和工程学中使用的基本计算内核。结果，大量应用程序的性能取决于SpMV的效率。实际上，此内核是带宽受限的操作，当矩阵具有不规则结构时，对优化提出了挑战。关于在面向吞吐量的许多核心处理器上实现SpMV的文献非常广泛，并且大多集中在矩阵格式上，提出了各种不同的想法来使矩阵稀疏性适应基础架构。在本文中，我们提出了一种新的基于ELL的矩阵格式，称为自适应ELL（AdELL），以改进图形处理单元（GPU）上SpMV的最新技术。 AdELL格式基于以下思想：根据工作线程的计算负载将工作线程分配给行，创建平衡的硬件级块（warp），这些块充分利用了流多处理器（SM）上的矢量化执行。 AdELL数据结构是使用新颖的翘曲平衡启发式方法创建的，该启发式方法旨在在不调整任何参数的情况下使经纱间的工作量平稳。 AdELL提供了有效的扭曲级同步（与块级相反），但还可以使用原子操作在多个扭曲上分布非常偏斜的行。此外，我们引入了一种循环展开启发式方法，该方法可根据翘曲工作负载选择最佳展开因子，从而优化SpMV性能。我们在一组来自异构应用程序域的常规基准上测试了建议的AdELL稀疏格式。结果表明，双精度计算的性能得到了实质性和一致的改进，优于最新的集成框架clSpMV。我们可以观察到加速峰值可以达到1.94，平均改善幅度为25％（采用简单的1x2阻止策略后，可以提高到43％）。

著录项

来源
《International Conference on Parallel Processing》|2013年|11-20|共10页
会议地点
作者
Maggioni Marco; Berger-Wolf Tanya;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Adaptive; ELL; GPU; Sparse Matrix Vector Multiplication; Warp-Balancing;

机译：自适应; ELL; GPU;稀疏矩阵矢量乘法; Warp-Balance;

相似文献

外文文献
中文文献
专利

1. The Sliced COO Format for Sparse Matrix-Vector Multiplication on CUDA-enabled GPUs [J] . Hoang-Vu Dang, Bertil Schmidt Procedia Computer Science . 2012,第1期

机译：启用CUDA的GPU上稀疏矩阵向量乘法的切片COO格式
2. A hybrid format for better performance of sparse matrix-vector multiplication on a GPU [J] . Guo Dahai, Gropp William, Olson Luke N. International Journal of High Performance Computing Applications . 2016,第1期

机译：一种混合格式，可在GPU上更好地实现稀疏矩阵矢量乘法
3. A Family of Bit-Representation-Optimized Formats for Fast Sparse Matrix-Vector Multiplication on the GPU [J] . Tang Wai Teng, Tan Wen Jun, Goh Rick Siow Mong, Parallel and Distributed Systems, IEEE Transactions on . 2015,第9期

机译：GPU上用于快速稀疏矩阵矢量乘法的一系列位表示优化格式
4. AdELL: An Adaptive Warp-Balancing ELL Format for Efficient Sparse Matrix-Vector Multiplication on GPUs [C] . Maggioni Marco, Berger-Wolf Tanya International Conference on Parallel Processing . 2013

机译：Adell：一种自适应经线平衡的ELL格式，用于GPU上的有效稀疏矩阵 - 矢量乘法
5. Analysis of High Performance Sparse Matrix-Vector Multiplication for Small Finite Fields [D] . Lambert, Matthew A. 2020

机译：小型有限字段高性能稀疏矩阵矢量乘法分析
6. Fast and efficient fully 3D PET image reconstruction using sparse system matrix factorization with GPU acceleration [O] . Jian Zhou, Jinyi Qi -1

机译：使用具有GpU加速稀疏系统矩阵分解快速高效的全3D pET图像重建
7. The Sliced COO Format for Sparse Matrix-Vector Multiplication on CUDA-enabled GPUs [O] . Dang Hoang-Vu, Schmidt Bertil 2012

机译：启用CUDA的GPU上稀疏矩阵向量乘法的切片COO格式

AdELL: An Adaptive Warp-Balancing ELL Format for Efficient Sparse Matrix-Vector Multiplication on GPUs

摘要

著录项

相似文献

相关主题

期刊订阅