首页> 外文期刊>Journal of Parallel and Distributed Computing >Optimization techniques for sparse matrix-vector multiplication on GPUs
【24h】

Optimization techniques for sparse matrix-vector multiplication on GPUs

机译:GPU上稀疏矩阵向量乘法的优化技术

获取原文
获取原文并翻译 | 示例
           

摘要

Sparse linearalgebra is fundamental to numerous areas of applied mathematics, science and engineering. In this paper, we propose an efficient data structure named AdELL+ for optimizing the SpMV kernel on GPUs, focusing on performance bottlenecks of sparse computation. The foundation of our work is an ELL-based adaptive format which copes with matrix irregularity using balanced warps composed using a parametrized warp-balancing heuristic. We also address the intrinsic bandwidth-limited nature of SpMV with warp granularity, blocking, delta compression and nonzero unrolling, targeting both memory footprint and memory hierarchy efficiency. Finally, we introduce a novel online auto-tuning approach that uses a quality metric to predict efficient block factors and that hides preprocessing overhead with useful SpMV computation. Our experimental results show that AdELL+ achieves comparable or better performance over other state-of-the-art SpMV sparse formats proposed in academia (BCCOO) and industry (CSR+ and CSR-Adaptive). Moreover, our auto-tuning approach makes AdELL+ viable for real-world applications.
机译:稀疏线性代数是应用数学,科学和工程学众多领域的基础。在本文中,我们提出了一种有效的数据结构AdELL +,用于优化GPU上的SpMV内核,重点是稀疏计算的性能瓶颈。我们工作的基础是一种基于ELL的自适应格式,该格式可以使用由参数化的翘曲平衡启发式算法构成的平衡翘曲来应对矩阵不规则性。我们还以扭曲粒度,阻塞,增量压缩和非零展开来解决SpMV的固有带宽限制性质,同时针对内存占用量和内存层次结构效率。最后,我们介绍了一种新颖的在线自动调整方法,该方法使用质量度量来预测有效的块因子,并通过有用的SpMV计算隐藏预处理开销。我们的实验结果表明,与学术界(BCCOO)和工业界(CSR +和CSR-Adaptive)提出的其他最新的SpMV稀疏格式相比,AdELL +可以达到相当或更好的性能。此外,我们的自动调整方法使AdELL +适用于实际应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号