首页> 外文会议>International Symposium on Computing and Networking Workshops >An Efficient Skinny Matrix-Matrix Multiplication Method by Folding Input Matrices into Tensor Core Operations
【24h】

An Efficient Skinny Matrix-Matrix Multiplication Method by Folding Input Matrices into Tensor Core Operations

机译:通过将输入矩阵折叠成张量核心操作的高效瘦矩阵矩阵乘法方法

获取原文

摘要

A specialized unit in NVIDIA’s GPUs, called Tensor Core, keeps attracting attention in the last couple of years due to its high computing capability for general matrix-matrix multiplications (GEMMs). A Tensor Core unit is capable of calculating a matrix multiply-accumulate (MMA) operation of a specific size. However, if the size of input matrices is skinner than that of a Tensor Core operation, some computations of a Tensor Core operation become wasted. Thus, this paper presents a method to optimize the calculation of skinny matrix-matrix multiplication that exploits the potential of the Tensor core units. The proposed method feeds multiple segments of an input matrix into a Tensor Core operation to utilize more computations. The experimental results show that the proposed method achieves up to a 2.7× speedup compared with the cuBLAS 11.0 library.
机译:由于其高计算能力(Gemms)的高计算能力,NVIDIA的GPU中的专门单位称为张量核心,在过去几年中,在过去几年中受到引起的注意。张量核心单元能够计算特定尺寸的矩阵乘法累积(MMA)操作。然而,如果输入矩阵的尺寸比张量芯操作的尺寸比呈张芯操作的大小,则浪费芯操作的一些计算变浪费。因此,本文提出了一种优化利用张解核心单元电位的瘦矩阵矩阵乘法计算的方法。所提出的方法将输入矩阵的多个片段馈送到张量核心操作中以利用更多计算。实验结果表明,与Cublas 11.0文库相比,该方法的加速高达2.7倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号