A Trip to Tahiti: Approaching a 5 TFlop SGEMM Using 3 AMD GPUs

机译：大溪地之旅：使用3个AMD GPU接近5 TFlop SGEMM

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Using GPUs as computational accelerators has been a growing area of research in the past several years. One particular area amenable to exploiting video card hardware is dense linear algebra. We continue this trend by generalizing the MAGMA xGEMM kernels, porting them to OpenCL and tuning them to run on the AMD 7970. Achieving up to 1.7 TFlops in SGEMM and 650 GFlops in DGEMM, we extend this performance to multiple GPUs using a parallel-for algorithm designed to run on multiple heterogeneous devices. Using 3 Radeon 7970s, our large GEMM algorithm obtains 4.37TFlops in single precision and 1.64 TFlops/s in double.

机译：在过去的几年中，使用GPU作为计算加速器一直是研究的一个增长领域。适于开发视频卡硬件的一个特定领域是密集的线性代数。通过推广MAGMA xGEMM内核，将其移植到OpenCL并对其进行调整以使其在AMD 7970上运行，我们继续了这一趋势。在SGEMM中达到1.7 TFlops在DGEMM中达到650 GFlops，我们使用并口扩展将性能扩展到多个GPU。设计用于在多个异构设备上运行的算法。我们的大型GEMM算法使用3个Radeon 7970，单精度获得4.37TFlops，双精度获得1.64 TFlops / s。

著录项

来源
《2012 Symposium on Application Accelerators in High Performance Computing.》|2012年|p.19- 25|共7页
会议地点 Argonne IL(US)
作者
Weber Rick; Peterson Gregory D.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. AMD GPUs tout performance as high as 2 TFLOPS [J] . Brian Dipert Electrical Design News . 2010,第21期

机译：AMD GPU宣称性能高达2 TFLOPS
2. GPUs für Wearables bis Server bieten TFLOPs [J] . Elektronikpraxis . 2015,第3期

机译：服务器可穿戴设备的GPU提供TFLOP
3. 145 TFlops Performance on 3990 GPUs of TSUBAME 2.0 Supercomputer for an Operational Weather Prediction [J] . Takashi Shimokawabe, Takayuki Aoki, Junichi Ishida, Procedia Computer Science . 2011,第1期

机译：TSUBAME 2.0超级计算机的3990 GPU上的145 TFlops性能可用于运行天气预报
4. A Trip to Tahiti: Approaching a 5 TFlop SGEMM Using 3 AMD GPUs [C] . Weber Rick, Peterson Gregory D. Symposium on Application Accelerators in High Performance Computing . 2012

机译：达希提之旅：使用3个AMD GPU接近5 TFLOP SGEMM
5. A GIS approach to linking spatial patterns and trip generation/trip distribution modeling [D] . Harris, David Michael 1995

机译：GIS方法将空间模式与行程生成/行程分布模型链接起来
6. GPU accelerated real-time confocal fluorescence lifetime imaging microscopy (FLIM) based on the analog mean-delay (AMD) method [O] . Byungyeon Kim, Byungjun Park, Seungrag Lee, 2016

机译：基于模拟均值延迟（AMD）方法的GPU加速实时共聚焦荧光寿命成像显微镜（FLIM）
7. Performance Upper Bound Analysis and Optimization of SGEMM on Fermi and Kepler GPUs [O] . Lai, Junjie, Seznec, André 2013

机译：Fermi和Kepler GPU上SGEMM的性能上限分析和优化

A Trip to Tahiti: Approaching a 5 TFlop SGEMM Using 3 AMD GPUs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅