首页> 外文会议>International Conference on Parallel Processing >AdELL: An Adaptive Warp-Balancing ELL Format for Efficient Sparse Matrix-Vector Multiplication on GPUs
【24h】

AdELL: An Adaptive Warp-Balancing ELL Format for Efficient Sparse Matrix-Vector Multiplication on GPUs

机译:AdELL:一种适用于GPU的高效稀疏矩阵矢量乘法的自适应Warp-balancing ELL格式

获取原文

摘要

The sparse matrix-vector multiplication (SpMV) is a fundamental computational kernel used in science and engineering. As a result, the performance of a large number of applications depends on the efficiency of the SpMV. This kernel is, in fact, a bandwidth-limited operation and poses a challenge for optimization when the matrix has an irregular structure. The literature on implementing SpMV on throughput-oriented many core processors is extensive and mostly focuses on matrix formats, proposing different ideas to adapt matrix sparsity to the underlying architecture. In this paper, we propose a novel ELL-based matrix format called Adaptive ELL (AdELL) to improve the state-of-the-art of the SpMV on Graphic Processing Units (GPUs). The AdELL format is based on the idea of distributing working threads to rows according to their computational load, creating balanced hardware-level blocks (warps) that take full advantage of the vectorized execution on Streaming Multiprocessors (SMs). The AdELL data structure is created using a novel warp-balancing heuristic designed to smooth the workload among warps without the need of tuning any parameters. AdELL provides an efficient warp-level synchronization (as opposed to block-level) but can also use atomic operations to distribute very skewed rows over multiple warps. Moreover, we introduce a loop unrolling heuristic that optimizes the SpMV performance by selecting the best unrolling factor based on the warp workload. We tested the proposed AdELL sparse format on a set of conventional benchmarks from heterogeneous application domains. The results show substantial and consistent performance improvements for double-precision calculations, outperforming the state-of-the-art ensemble framework clSpMV. We could observe speedup peaks up to 1.94 and a 25% (geometric) average improvement, which can be potentially increased to 43% introducing a simple 1x2 blocking strategy.
机译:稀疏矩阵矢量乘法(SpMV)是科学和工程学中使用的基本计算内核。结果,大量应用程序的性能取决于SpMV的效率。实际上,此内核是带宽受限的操作,当矩阵具有不规则结构时,对优化提出了挑战。关于在面向吞吐量的许多核心处理器上实现SpMV的文献非常广泛,并且大多集中在矩阵格式上,提出了各种不同的想法来使矩阵稀疏性适应基础架构。在本文中,我们提出了一种新的基于ELL的矩阵格式,称为自适应ELL(AdELL),以改进图形处理单元(GPU)上SpMV的最新技术。 AdELL格式基于以下思想:根据工作线程的计算负载将工作线程分配给行,创建平衡的硬件级块(warp),这些块充分利用了流多处理器(SM)上的矢量化执行。 AdELL数据结构是使用新颖的翘曲平衡启发式方法创建的,该启发式方法旨在在不调整任何参数的情况下使经纱间的工作量平稳。 AdELL提供了有效的扭曲级同步(与块级相反),但还可以使用原子操作在多个扭曲上分布非常偏斜的行。此外,我们引入了一种循环展开启发式方法,该方法可根据翘曲工作负载选择最佳展开因子,从而优化SpMV性能。我们在一组来自异构应用程序域的常规基准上测试了建议的AdELL稀疏格式。结果表明,双精度计算的性能得到了实质性和一致的改进,优于最新的集成框架clSpMV。我们可以观察到加速峰值可以达到1.94,平均改善幅度为25%(采用简单的1x2阻止策略后,可以提高到43%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号