首页> 外文期刊>Concurrency, practice and experience >A novelmulti–graphics processing unit parallel optimizationrnframework for the sparsematrix-vectormultiplication
【24h】

A novelmulti–graphics processing unit parallel optimizationrnframework for the sparsematrix-vectormultiplication

机译:稀疏矩阵向量乘法的新型多图形处理单元并行优化框架

获取原文
获取原文并翻译 | 示例
           

摘要

The sparse matrix-vector multiplication (SpMV) is of great importance in scientific computations.rnGraphics processing unit (GPU)-accelerated SpMVs for large-sized problems have attractedrnconsiderable attention recently. We observe that on a specific multi-GPU platform, the SpMVrnperformance can usually be greatly improved when a matrix is partitioned into several blocksrnaccording toa predetermined rule andeach block is assigned toaGPUwithanappropriate storagernformat. This motivates us to propose a novel multi-GPU parallel SpMV optimization framework,rnwhich involves the following parts: (1) a simple rule is defined to divide any given matrix amongrnmultiple GPUs; (2) a performance model, which is independent of the problems and dependentrnon the resources of devices, is proposed to accurately predict the execution time of SpMV kernels;rnand (3) a selection algorithm is suggested to automatically select the most appropriate onernfrom the storage formats that are involved in the framework for the matrix block that is assignedrnto each GPU on the basis of the performance model. The objective of our framework does notrnconstruct a new storage format or algorithm but automatically and rapidly generates an optimallyrnparallel SpMV for any sparse matrix on a specific multi-GPU platform by integrating the existingrnstorage formats and their corresponding kernels.We take 5 popular storage formats, for example,rnto present the idea of constructing the framework. Theoretically, we validate the correctness ofrnour proposed SpMV performance model. This model is constructed only once for each type ofrnGPU. Moreover, this framework is general and easy to be extensible. For a storage format that isrnnot included in our framework, once the performance model of its corresponding SpMV kernelrnis successfully constructed, it can be incorporated into our framework. The experiments validaternthe efficiency of our proposed framework.
机译:稀疏矩阵矢量乘法(SpMV)在科学计算中具有重要意义。最近,图形处理单元(GPU)加速的SpMV用于大问题的研究引起了相当大的关注。我们观察到,在特定的多GPU平台上,将矩阵按照预定规则划分为几个块,并将每个块分配给具有适当存储格式的GPU,通常可以大大提高SpMVrn性能。这促使我们提出一种新颖的多GPU并行SpMV优化框架,该框架包括以下部分:(1)定义了一个简单规则,可以在多个GPU之间划分任何给定的矩阵; (2)提出了一个独立于问题和设备资源的性能模型,以准确预测SpMV内核的执行时间;(3)提出了一种选择算法,可以从存储中自动选择最合适的处理器基于性能模型分配给每个GPU的矩阵块框架中涉及的格式。我们框架的目标不是构建新的存储格式或算法,而是通过集成现有的存储格式及其相应的内核,针对特定的多GPU平台上的任何稀疏矩阵自动快速生成最优并行SpMV,我们采用5种流行的存储格式,例如,展示了构建框架的想法。从理论上讲,我们验证了提出的SpMV性能模型的正确性。每种rnGPU类型仅构建一次此模型。而且,该框架是通用的,并且易于扩展。对于我们的框架中未包含的存储格式,一旦成功构建了其对应的SpMV内核的性能模型,就可以将其合并到我们的框架中。实验验证了我们提出的框架的效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号