首页> 外文会议>ICNC 2012 >A Fast implementation of matrix-matrix product in double-double precision on NVIDIA C2050 and application to semidefinite programming
【24h】

A Fast implementation of matrix-matrix product in double-double precision on NVIDIA C2050 and application to semidefinite programming

机译:NVIDIA C2050双双双精度中矩阵矩阵产品的快速实现及其在Semidefinite编程中的应用

获取原文

摘要

We have implemented a fast double-double precision (has approx. 32 decimal significant digits) version of matrix-matrix multiplication routine called "Rgemm" of MPACK (http://mplapack.sourceforge.net/) on NVIDIA C2050 GPU. This routine is a higher precision version of "dgemm" in the BLAS (Basic Linear Algebra Subprograms) library. Our implementation is the fastest to date using NVIDIA C2050 and most efficient on NVIDIA GPUs; we achieved the peak performances of 16.4GFlops for the kernel performance (16.1GFlops with CPU-GPU transfer included), and 26.4GFlops (25.7GFlops with CPU-GPU transfer included) by employing lower accuracy arithmetic. These are 92.3% (90.7%) and 87.1% (84.8%) of the theoretical peak performance of NVIDIA C2050, which is about 150 times faster than the reference implementation on Intel Xeon X3470. Moreover, our implementations can handle arbitrary sizes of matrices by employing "Pointer redirecting" technique by Nath et al. We integrated this GPU-accelerated version of Rgemm for double-double precision version of semidefinite programming solver called SDPA-DD, and the performance improved at most 14.5 times. This version of Rgemm is available at http://mplapack.sourceforge.net/ since 2011/10/28.
机译:我们已经实现了一种快速的双重精度(具有约32个十进制大致数字)矩阵 - 矩阵乘法例程,称为“RGEMM”的MPACK(http://mplapack.sourceforge.net/)上的nvidia c2050 gpu。该例程是BLAS(基本线性代数子程序)库中的“DGEMM”的更高精确版本。我们的实施是使用NVIDIA C2050迄今为止最快的最佳迄今为止,最有效地对NVIDIA GPU;我们通过采用较低的精度算术,实现了核性性能的16.4GFlops的峰值性能(包括CPU-GPU传输的16.1GFlock),并通过采用较低的精度算术来实现26.4GFLOPS(包括CPU-GPU传输25.7GFLOPS)。这些是NVIDIA C2050的理论峰值性能的92.3%(90.7%)和87.1%(84.8%),比英特尔Xeon X3470的参考实施速度快约150倍。此外,我们的实施方式可以通过Nath等人采用“指针重定向”技术来处理任意大小的矩阵。我们整合了这种GPU加速版的RGEMM,用于Double-Double Precision版本的SEMIDEFINITE编程求解器称为SDPA-DD,并且性能最高为14.5次。此版本的RGEMM可在http://mplapack.sourceforge.net/获得2011/10/28。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号