首页> 外文会议>Applied parallel and scientific computing.;part 1. >Implementation and Evaluation of Quadruple Precision BLAS Functions on GPUs
【24h】

Implementation and Evaluation of Quadruple Precision BLAS Functions on GPUs

机译:GPU上四重精度BLAS功能的实现和评估

获取原文
获取原文并翻译 | 示例

摘要

We implemented the quadruple precision Basic Linear Algebra Subprograms (BLAS) functions, AXPY, GEMV and GEMM, on graphics processing units (GPUs), and evaluated their performance. We used DD-type quadruple precision operations, which combine two double precision values to represent a quadruple precision value. On an NVIDIA Tesla C1060, our BLAS functions are up to approximately 30 times faster than the existing quadruple precision BLAS on an Intel Core i7 920. Additionally, the execution time of quadruple precision AXPY takes only approximately 2.7 times longer than that of double precision AXPY on the Tesla C1060. We have shown that quadruple precision BLAS operations are suitable for GPUs.
机译:我们在图形处理单元(GPU)上实现了四倍精度的基本线性代数子程序(BLAS),AXPY,GEMV和GEMM,并对其性能进行了评估。我们使用了DD型四倍精度运算,该运算将两个双精度值组合起来代表一个四倍精度值。在NVIDIA Tesla C1060上,我们的BLAS功能要比Intel Core i7 920上现有的四倍精度BLAS快约30倍。此外,四倍精度AXPY的执行时间仅比双倍精度AXPY的执行时间长约2.7倍在Tesla C1060上。我们已经证明,四精度BLAS操作适用于GPU。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号