【24h】

Strassen's Matrix Multiplication on GPUs

机译:GPU上的Strassen矩阵乘法

获取原文

摘要

We provide efficient single-precision and integer GPU implementations of Strassen''s algorithm as well as of Winograd''s variant. On an NVIDIA C1060 GPU, a speedup of 32% (35%) is obtained for Strassen''s 4-level implementation and 33% (36%) for Winograd''s variant relative to the sgemm (integer version of sgemm) code in CUBLAS 3.0 when multiplying 16384×16384 matrices. The maximum numerical error for the single-precision implementations is about 2 orders of magnitude higher than those for sgemm when n = 16384 and is zero for the integer implementations.
机译:我们提供Strassen算法以及Winograd变体的高效单精度和整数GPU实现。在NVIDIA C1060 GPU上,相对于sgemm(sgemm的整数版本)代码,斯特拉森的4级实现实现了32%(35%)的加速,而Winograd的变体实现了33%(36%)的加速。在CUBLAS 3.0中将16384×16384矩阵相乘。当n = 16384时,单精度实现的最大数值误差比sgemm高2个数量级,对于整数实现,其最大数值误差为零。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号