首页> 外文会议>International Conference on Networking and Computing >A Fast Implementation of Matrix-matrix Product in Double-double Precision on NVIDIA C2050 and Application to Semidefinite Programming
【24h】

A Fast Implementation of Matrix-matrix Product in Double-double Precision on NVIDIA C2050 and Application to Semidefinite Programming

机译:NVIDIA C2050双双双精度中矩阵矩阵产品的快速实现及其在Semidefinite编程中的应用

获取原文

摘要

We have implemented a fast double-double precision (has approx. 32 decimal significant digits) version of matrix-matrix multiplication routine called gRgemmh of MPACK (http://mplapack.sourceforge.net/) on NVIDIA C2050 GPU. This routine is a higher precision version of gdgemmh in the BLAS (Basic Linear Algebra Subprograms) library. Our implementation is the fastest to date using NVIDIA C2050 and most efficient on NVIDIA GPUs, we achieved the peak performances of 16.4GFlops for the kernel performance (16.1GFlops with CPU-GPU transfer included), and 26.4GFlops (25.7GFlops with CPU-GPU transfer included) by employing lower accuracy arithmetic. These are 92.3% (90.7%) and 87.1% (84.8%) of the theoretical peak performance of NVIDIA C2050, which is about 150 times faster than the reference implementation on Intel Xeon X3470. Moreover, our implementations can handle arbitrary sizes of matrices by employing gPointer redirectingh technique by Nath et al. We integrated this GPU-accelerated version of Rgemm for double-double precision version of semi definite programming solver called SDPA-DD, and the performance improved at most 14.5 times. This version of Rgemm is available at http://mplapack.sourceforge.net/ since 2011/10/28.
机译:我们已经实现了快速双双精度(具有约32十进制显著位)矩阵间呼吁NVIDIA GPU C2050 MPack,还将(http://mplapack.sourceforge.net/)的gRgemmh乘法程序的版本。这个程序是在BLAS(基本线性代数子程序)库gdgemmh的精度更高的版本。我们的实现是采用了NVIDIA C2050和最有效的在NVIDIA GPU最快到今天为止,我们(与CPU-GPU传输在内16.1GFlops)实现16.4GFlops的峰值性能为内核的性能,并26.4GFlops(25.7GFlops与CPU-GPU转移包括)通过采用低精度的运算。这些是92.3%(90.7%)和NVIDIA C2050的理论峰值性能,这是比上的Intel Xeon X3470参考实现快约150倍的87.1%(84.8%)。此外,我们的实施方式可以通过采用gPointer redirectingh技术通过Nath等处理矩阵的任意大小。我们综合Rgemm的半定规划的双重双精度版本的GPU加速版本的求解器称为SDPA-DD,而且性能最多14.5倍提高。这Rgemm的版本可在http://mplapack.sourceforge.net/自2011年10月28日。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号