首页> 外文会议>2012 Symposium on Application Accelerators in High Performance Computing. >A Trip to Tahiti: Approaching a 5 TFlop SGEMM Using 3 AMD GPUs
【24h】

A Trip to Tahiti: Approaching a 5 TFlop SGEMM Using 3 AMD GPUs

机译:大溪地之旅:使用3个AMD GPU接近5 TFlop SGEMM

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Using GPUs as computational accelerators has been a growing area of research in the past several years. One particular area amenable to exploiting video card hardware is dense linear algebra. We continue this trend by generalizing the MAGMA xGEMM kernels, porting them to OpenCL and tuning them to run on the AMD 7970. Achieving up to 1.7 TFlops in SGEMM and 650 GFlops in DGEMM, we extend this performance to multiple GPUs using a parallel-for algorithm designed to run on multiple heterogeneous devices. Using 3 Radeon 7970s, our large GEMM algorithm obtains 4.37TFlops in single precision and 1.64 TFlops/s in double.
机译:在过去的几年中,使用GPU作为计算加速器一直是研究的一个增长领域。适于开发视频卡硬件的一个特定领域是密集的线性代数。通过推广MAGMA xGEMM内核,将其移植到OpenCL并对其进行调整以使其在AMD 7970上运行,我们继续了这一趋势。在SGEMM中达到1.7 TFlops在DGEMM中达到650 GFlops,我们使用并口扩展将性能扩展到多个GPU。设计用于在多个异构设备上运行的算法。我们的大型GEMM算法使用3个Radeon 7970,单精度获得4.37TFlops,双精度获得1.64 TFlops / s。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号