首页> 外文会议>International Conference on Networking and Computing >A Fast Implementation of Matrix-matrix Product in Double-double Precision on NVIDIA C2050 and Application to Semidefinite Programming

【24h】

A Fast Implementation of Matrix-matrix Product in Double-double Precision on NVIDIA C2050 and Application to Semidefinite Programming

机译：NVIDIA C2050双双双精度中矩阵矩阵产品的快速实现及其在Semidefinite编程中的应用

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We have implemented a fast double-double precision (has approx. 32 decimal significant digits) version of matrix-matrix multiplication routine called gRgemmh of MPACK (http://mplapack.sourceforge.net/) on NVIDIA C2050 GPU. This routine is a higher precision version of gdgemmh in the BLAS (Basic Linear Algebra Subprograms) library. Our implementation is the fastest to date using NVIDIA C2050 and most efficient on NVIDIA GPUs, we achieved the peak performances of 16.4GFlops for the kernel performance (16.1GFlops with CPU-GPU transfer included), and 26.4GFlops (25.7GFlops with CPU-GPU transfer included) by employing lower accuracy arithmetic. These are 92.3% (90.7%) and 87.1% (84.8%) of the theoretical peak performance of NVIDIA C2050, which is about 150 times faster than the reference implementation on Intel Xeon X3470. Moreover, our implementations can handle arbitrary sizes of matrices by employing gPointer redirectingh technique by Nath et al. We integrated this GPU-accelerated version of Rgemm for double-double precision version of semi definite programming solver called SDPA-DD, and the performance improved at most 14.5 times. This version of Rgemm is available at http://mplapack.sourceforge.net/ since 2011/10/28.

机译：我们已经实现了快速双双精度（具有约32十进制显著位）矩阵间呼吁NVIDIA GPU C2050 MPack，还将（http://mplapack.sourceforge.net/）的gRgemmh乘法程序的版本。这个程序是在BLAS（基本线性代数子程序）库gdgemmh的精度更高的版本。我们的实现是采用了NVIDIA C2050和最有效的在NVIDIA GPU最快到今天为止，我们（与CPU-GPU传输在内16.1GFlops）实现16.4GFlops的峰值性能为内核的性能，并26.4GFlops（25.7GFlops与CPU-GPU转移包括）通过采用低精度的运算。这些是92.3％（90.7％）和NVIDIA C2050的理论峰值性能，这是比上的Intel Xeon X3470参考实现快约150倍的87.1％（84.8％）。此外，我们的实施方式可以通过采用gPointer redirectingh技术通过Nath等处理矩阵的任意大小。我们综合Rgemm的半定规划的双重双精度版本的GPU加速版本的求解器称为SDPA-DD，而且性能最多14.5倍提高。这Rgemm的版本可在http://mplapack.sourceforge.net/自2011年10月28日。

著录项

来源
《International Conference on Networking and Computing 》|2012年||共8页
会议地点
作者
Nakata Maho; Takao Yasuyoshi; Noda Shigeho; Himeno Ryutaro;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-53;
关键词
BLAS; GPU; MPACK; double-double precision; multiple precision;

机译：blas;gpu;mpack;双倍精度;多精度;

相似文献

外文文献
中文文献
专利

1. Fast implementation for semidefinite programs with positive matrix completion [J] . Yamashita Makoto, Nakata Kazuhide Optimization methods & software . 2015 ,第4a6期

机译：具有正矩阵完成功能的半定程序的快速实现
2. A preliminary set of applications leading to stochastic semidefinite programs and chance-constrained semidefinite programs [J] . Yuntao Zhu, K.A. Ariyawansa Applied Mathematical Modelling . 2011 ,第5期

机译：导致随机半定程序和机会受限半定程序的一组初步应用
3. A sensitivity result for quadratic semidefinite programs with an application to a sequential quadratic semidefinite programming algorithm [J] . Garcés Rodrigo, Gómez Walter, Jarre Florian Computational and Applied Mathematics . 2012 ,第1期

机译：二次半定程序的灵敏度结果及其在顺序二次半定程序算法中的应用
4. A Fast Implementation of Matrix-matrix Product in Double-double Precision on NVIDIA C2050 and Application to Semidefinite Programming [C] . Nakata Maho, Takao Yasuyoshi, Noda Shigeho, 2012 Third International Conference on Networking and Computing. . 2012

机译：基于NVIDIA C2050的双精度双精度矩阵矩阵产品的快速实现及其在半定编程中的应用
5. Fast Approximation Algorithms for Graph Partitioning Using Spectral and Semidefinite-Programming Techniques. [D] . Orecchia, Lorenzo. 2011

机译：使用光谱和半定编程技术进行图划分的快速近似算法。
6. Ensemble Clustering using Semidefinite Programming with Applications [O] . Vikas Singh, Lopamudra Mukherjee, Jiming Peng, -1

机译：合奏聚类使用与应用半定规划
7. Implementation and performance evaluation of an extended precision floating-point arithmetic library for high-accuracy semidefinite programming [O] . Joldes, Mioara, Muller, Jean-Michel, Popescu, Valentina 2017

机译：高精度半定规划扩展精度浮点运算库的实现和性能评估
8. Semidefinite and Cone Programming: Theory, Implementation and Applications [R] . Monteiro, R. D. 2004

机译：半定和锥编程：理论，实现和应用

A Fast Implementation of Matrix-matrix Product in Double-double Precision on NVIDIA C2050 and Application to Semidefinite Programming

摘要

著录项

相似文献

相关主题

期刊订阅