Implementing the Himeno benchmark with CUDA on GPU clusters

机译：用CUDA在GPU集群上实施HIMENO基准

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper describes the use of CUDA to accelerate the Himeno benchmark on clusters with GPUs. The implementation is designed to optimize memory bandwidth utilization. Our approach achieves over 83% of the theoretical peak bandwidth on a NVIDIA Tesla C1060 GPU and performs at over 50 GFlops. A multi-GPU implementation that utilizes MPI alongside CUDA streams to overlap GPU execution with data transfers allows linear scaling and performs at over 800 GFlops on a cluster with 16 GPUs. The paper presents the optimizations required to achieve this level of performance.

机译：本文介绍了CUDA将HIMENO基准与GPU的群体加速。实现旨在优化内存带宽利用率。我们的方法在NVIDIA Tesla C1060 GPU上实现了超过83％的理论峰值带宽，并在50多个GFLOPS上进行。利用MPI与CUDA流与数据传输执行的多GPU实现，利用CUDA流与数据传输执行允许线性缩放并在具有16个GPU的群集中以超过800 GFLOPS执行。本文提出了实现这种性能水平所需的优化。

著录项

来源
《IEEE International Symposium on Parallel Distributed Processing》|2010年||共10页
会议地点
作者
Phillips E.H.; Fatica M.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.138-53;
关键词

相似文献

外文文献
中文文献
专利

1. CUDA-MPI Implementation of Fast Multipole Method on GPU Clusters for Dielectric Objects [J] . Nghia Tran, Tuan Phan, Kilic Ozlem Applied Computational Electromagnetics Society journal . 2018,第2期

机译：快速多极方法在介电对象GPU群集上的CUDA-MPI实现
2. A lightweight BLASTP and its implementation on CUDA GPUs [J] . Liang-Tsung Huang, Kai-Cheng Wei, Chao-Chin Wu, Journal of supercomputing . 2021,第1期

机译：轻量级BLASTP及其在CUDA GPU上的实现
3. Parallel Fast Walsh Transform Algorithm and Its Implementation with CUDA on GPUs [J] . Dusan Bikov, Iliya Bouyukliev Cybernetics and information technologies: CIT . 2017,第5期

机译：并行快速Walsh变换算法及其在GPU上的CUDA实现
4. Implementing the Himeno benchmark with CUDA on GPU clusters [C] . Phillips Everett H., Fatica Massimiliano 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：在GPU群集上使用CUDA实施Himeno基准测试
5. An MPI-CUDA implementation of a model for calcium induced calcium release in a three-dimensional heart cell on a hybrid CPU/GPU cluster [D] . Huang, Xuan 2015

机译：MPI-CUDA模型在混合CPU / GPU集群上的三维心脏细胞中钙诱导的钙释放的模型实现
6. Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA [O] . Dariusz Mrozek, Miłosz Brożek, Bożena Małysiak-Mrozek -1

机译：使用GPU和CUDA并行实现3D蛋白质结构相似性搜索
7. An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters [O] . Jacobsen Dana A., Thibault Julien C., Senocak Inanc 2010

机译：MPI-CUDA在多GPU群集上大规模并行不可压缩流量计算的实现

Implementing the Himeno benchmark with CUDA on GPU clusters

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅