Efficient Triangular Matrix Vector Multiplication on the GPU

机译：高效三角矩阵矢量乘法对GPU

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The main purpose of this paper is to present a very efficient GPU implementation to compute the trmv, the product of a triangular matrix and a vector. Usually, developers use cuBLAS, a linear algebra library optimized for each of various generations of GPUs, to compute the trmv. To attain better performance than cuBLAS, our GPU implementation of the trmv uses various acceleration technique for latest GPUs. More specifically, our GPU implementation has the following features: (1) only one kernel is called; (2) maximum number of threads are invoked; (3) all memory access to the global memory is coalesced; (4) all memory access to the shared memory has no bank conflict; and (5) shared memory access is minimized by a warp shuffle function. Experimental results for five generations of NVIDIA GPUs for matrices of sizes from 32 × 32 to 16K × 16K for fp32 show that our GPU implementation is faster than cuBLAS and muBLAS for almost all matrix sizes and GPU generations.

机译：本文的主要目的是呈现一个非常有效的GPU实现来计算TRMV，三角形矩阵的乘积和载体。通常，开发人员使用Cublas，针对各个GPU的每个GPU优化的线性代数库，以计算TRMV。为了获得比CUBLA更好的性能，我们的GPU实现TRMV使用各种加速技术进行最新的GPU。更具体地说，我们的GPU实现具有以下功能：（1）仅调用一个内核; （2）调用最大线程数; （3）对全局内存的所有内存访问合并; （4）对共享内存的所有内存访问都没有银行冲突; （5）通过Warp Shuffle功能最小化共享内存访问。对于FP32的32×32至16K×16K的尺寸为32×32至16K×16k的五代NVIDIA GPU的实验结果表明，我们的GPU实现比Cublas和Mublas几乎所有矩阵大小和GPU世代都更快。

著录项

来源
《International Conference on Parallel Processing and Applied Mathematics;Workshop on Models, Algorithms, and Methodologies for Hierarchical Parallelism in New HPC Systems;Workshop on Scheduling for Parallel Computing;Minisymposium on HPC Applications in Physical Sciences;Workshop on Power and Energy Aspects of Computation;Workshop on Complex Collective Systems;Minisymposium on High Performance Computing Interval Methods;Workshop on Language-Based Parallel Programming Models;Workshop on Applied High-Performance Numerical Algorithms in PDEs》|2020年|581p|共12页
会议地点
作者
Takahiro Inoue; Hiroki Tokura; Koji Nakano; Yasuaki Ito;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP316.4-53;
关键词
Matrix multiplication; Trmv; Parallel algorithm; GPGPU;

机译：矩阵乘法;TRMV;并行算法;GPGPU;

相似文献

外文文献
中文文献
专利

1. GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication [J] . Yuan Tao, Yangdong Deng, Shuai Mu, Concurrency and computation: practice and experience . 2015,第14期

机译：GPU加速的稀疏矩阵-向量乘法和稀疏矩阵-转置向量乘法
2. Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU [J] . He Guixia, Gao Jiaquan Mathematical Problems in Engineering . 2016,第PTa11期

机译：GPU上基于CSR的高效矩阵向量乘法
3. A TASK-SCHEDULING APPROACH FOR EFFICIENT SPARSE SYMMETRIC MATRIX-VECTOR MULTIPLICATION ON A GPU [J] . Mironowicz P., Dziekonski A., Mrozowski M. SIAM Journal on Scientific Computing . 2015,第6期

机译：GPU上有效的稀疏对称矩阵矢量相乘的任务调度方法
4. Efficient Triangular Matrix Vector Multiplication on the GPU [C] . Takahiro Inoue, Hiroki Tokura, Koji Nakano, International Conference on Parallel Processing and Applied Mathematics;Workshop on Models, Algorithms, and Methodologies for Hierarchical Parallelism in New HPC Systems;Workshop on Scheduling for Parallel Computing;Minisymposium on HPC Applications in Physical Sciences;Workshop on Power and Energy Aspects of Computation;Workshop on Complex Collective Systems;Minisymposium on High Performance Computing Interval Methods;Workshop on Language-Based Parallel Programming Models;Workshop on Applied High-Performance Numerical Algorithms in PDEs . 2020

机译：高效三角矩阵矢量乘法对GPU
5. Optimizing Tall-and-skinny Matrix-matrix Multiplication on GPUs [D] . Xiong, Nan 2018

机译：在GPU上优化高而瘦的矩阵矩阵乘法
6. HIERARCHICAL ORTHOGONAL MATRIX GENERATION AND MATRIX-VECTOR MULTIPLICATIONS IN RIGID BODY SIMULATIONS [O] . FUHUI FANG, JINGFANG HUANG, GARY HUBER, -1

机译：刚体模拟中的正交正交矩阵生成和矩阵向量乘法
7. Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format [O] . Shaohuai Shi, Qiang Wang, Xiaowen Chu 2020

机译：使用定制稀疏存储格式的高效稀疏密集矩阵矩阵乘法

Efficient Triangular Matrix Vector Multiplication on the GPU

摘要

著录项

相似文献

相关主题

期刊订阅