首页> 外文会议>International Conference on Parallel and Distributed Systems >Improving Performance of Matrix Multiplication and FFT on GPU
【24h】

Improving Performance of Matrix Multiplication and FFT on GPU

机译:提高Matrix乘法和FFT对GPU的性能

获取原文

摘要

In this paper we discuss about our experiences in improving the performance of two key algorithms: the single-precision matrix-matrix multiplication subprogram (SGEMM of BLAS) and single-precision FFT using CUDA. The former is computation-intensive, while the latter is memory bandwidth or communication-intensive. A peak performance of 393 Gflops is achieved on NVIDIA GeForce GTX280 for the former, about 5% faster than the CUBLAS 2.0 library. Better FFT performance results are obtained for a range of dimensions. Some common principles are discussed for the design and implementation of many-core algorithms.
机译:在本文中,我们讨论了我们在提高两个关键算法的性能方面的经验:使用CUDA的单精度矩阵矩阵乘法(BLAS的SGAM)和单精度FFT。前者是计算密集型,而后者是内存带宽或通信密集型。对于前者的NVIDIA GeForce GTX280实现了393 GFlops的峰值性能,比Cublas 2.0库更快地约为5%。为一系列尺寸获得更好的FFT性能结果。讨论了许多核心算法的设计和实现的一些共同原则。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号