A novelmulti–graphics processing unit parallel optimizationrnframework for the sparsematrix-vectormultiplication

Jiaquan Gao; YuWang; JunWang

首页> 外文期刊>Concurrency, practice and experience >A novelmulti–graphics processing unit parallel optimizationrnframework for the sparsematrix-vectormultiplication

【24h】

A novelmulti–graphics processing unit parallel optimizationrnframework for the sparsematrix-vectormultiplication

机译：稀疏矩阵向量乘法的新型多图形处理单元并行优化框架

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The sparse matrix-vector multiplication (SpMV) is of great importance in scientific computations.rnGraphics processing unit (GPU)-accelerated SpMVs for large-sized problems have attractedrnconsiderable attention recently. We observe that on a specific multi-GPU platform, the SpMVrnperformance can usually be greatly improved when a matrix is partitioned into several blocksrnaccording toa predetermined rule andeach block is assigned toaGPUwithanappropriate storagernformat. This motivates us to propose a novel multi-GPU parallel SpMV optimization framework,rnwhich involves the following parts: (1) a simple rule is defined to divide any given matrix amongrnmultiple GPUs; (2) a performance model, which is independent of the problems and dependentrnon the resources of devices, is proposed to accurately predict the execution time of SpMV kernels;rnand (3) a selection algorithm is suggested to automatically select the most appropriate onernfrom the storage formats that are involved in the framework for the matrix block that is assignedrnto each GPU on the basis of the performance model. The objective of our framework does notrnconstruct a new storage format or algorithm but automatically and rapidly generates an optimallyrnparallel SpMV for any sparse matrix on a specific multi-GPU platform by integrating the existingrnstorage formats and their corresponding kernels.We take 5 popular storage formats, for example,rnto present the idea of constructing the framework. Theoretically, we validate the correctness ofrnour proposed SpMV performance model. This model is constructed only once for each type ofrnGPU. Moreover, this framework is general and easy to be extensible. For a storage format that isrnnot included in our framework, once the performance model of its corresponding SpMV kernelrnis successfully constructed, it can be incorporated into our framework. The experiments validaternthe efficiency of our proposed framework.

机译：稀疏矩阵矢量乘法（SpMV）在科学计算中具有重要意义。最近，图形处理单元（GPU）加速的SpMV用于大问题的研究引起了相当大的关注。我们观察到，在特定的多GPU平台上，将矩阵按照预定规则划分为几个块，并将每个块分配给具有适当存储格式的GPU，通常可以大大提高SpMVrn性能。这促使我们提出一种新颖的多GPU并行SpMV优化框架，该框架包括以下部分：（1）定义了一个简单规则，可以在多个GPU之间划分任何给定的矩阵；（2）提出了一个独立于问题和设备资源的性能模型，以准确预测SpMV内核的执行时间；（3）提出了一种选择算法，可以从存储中自动选择最合适的处理器基于性能模型分配给每个GPU的矩阵块框架中涉及的格式。我们框架的目标不是构建新的存储格式或算法，而是通过集成现有的存储格式及其相应的内核，针对特定的多GPU平台上的任何稀疏矩阵自动快速生成最优并行SpMV，我们采用5种流行的存储格式，例如，展示了构建框架的想法。从理论上讲，我们验证了提出的SpMV性能模型的正确性。每种rnGPU类型仅构建一次此模型。而且，该框架是通用的，并且易于扩展。对于我们的框架中未包含的存储格式，一旦成功构建了其对应的SpMV内核的性能模型，就可以将其合并到我们的框架中。实验验证了我们提出的框架的效率。

著录项

来源
《Concurrency, practice and experience》 |2017年第5期|1-13|共13页
作者
Jiaquan Gao; YuWang; JunWang;
展开▼
作者单位

School of Computer Science and Technology,Nanjing Normal University, Nanjing, 210097,China;

College of Computer Science and Technology,Zhejiang University of Technology, Hangzhou,310023, China;

US R&D Center, Huawei Technologies, SantaClara, 95050, CA, U.S.A.;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
CUDA; multiple GPUs; optimization framework; sparse matrix-vector multiplication;

机译：CUDA;多个GPU;优化框架;稀疏矩阵向量乘法;

相似文献

外文文献
中文文献
专利

1. Parallelizing flow-accumulation calculations on graphics processing units-From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm [J] . Cheng-Zhi Qin, Lijun Zhan Computers & geosciences . 2012,第期

机译：图形处理单元上并行的流量累积计算-从迭代DEM预处理算法到递归多流向算法
2. Parallelized Monte-Carlo dosimetry using graphics processing units to model cylindrical diffusers used in photodynamic therapy: From implementation to validation [J] . Dupont Clement, Baert Gregory, Mordon Serge, Photodiagnosis and Photodynamic Therapy . 2019,第JUNa期

机译：使用图形处理单元对光动力疗法中使用的圆柱形扩散器进行建模的并行蒙特卡洛剂量学：从实施到验证
3. Parallelized Monte-Carlo dosimetry using graphics processing units to model cylindrical diffusers used in photodynamic therapy: From implementation to validation [J] . Dupont Clement, Baert Gregory, Mordon Serge, Photodiagnosis and Photodynamic Therapy . 2019,第Juna期

机译：并行化的Monte-Carlo剂量测定用图形处理单元模拟光动力治疗中使用的圆柱形扩散器：从实现到验证
4. Parallelization of a Self-adaptive Harmony Search Algorithm on Graphics Processing Units [C] . Yin-Fu Huang, Sun-Ho Chou International Conference on Advanced Computational Intelligence . 2019

机译：图形处理单元上的自适应和谐搜索算法的并行化
5. Hole patching in unstructured mesh and parallelization using graphics processing units. [D] . Kumar, Amitesh. 2011

机译：非结构化网格中的孔修补和使用图形处理单元的并行化。
6. Parallelizing Affinity Propagation Using Graphics Processing Units for Spatial Cluster Analysis over Big Geospatial Data [O] . Xuan Shi -1

机译：使用图形处理单元对亲和力进行并行传播以对大地理空间数据进行空间聚类分析
7. Parallelized Monte-Carlo dosimetry using graphics processing units to model cylindrical diffusers used in photodynamic therapy: From implementation to validation [O] . Clément Dupont, Gregory Baert, Serge Mordon, 2019

机译：并行化的Monte-Carlo剂量测定用图形处理单元模拟光动力治疗中使用的圆柱形扩散器：从实现到验证

A novelmulti–graphics processing unit parallel optimizationrnframework for the sparsematrix-vectormultiplication

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅