Implementing the Himeno benchmark with CUDA on GPU clusters

机译：在GPU群集上使用CUDA实施Himeno基准测试

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper describes the use of CUDA to accelerate the Himeno benchmark on clusters with GPUs. The implementation is designed to optimize memory bandwidth utilization. Our approach achieves over 83% of the theoretical peak bandwidth on a NVIDIA Tesla C1060 GPU and performs at over 50 GFlops. A multi-GPU implementation that utilizes MPI alongside CUDA streams to overlap GPU execution with data transfers allows linear scaling and performs at over 800 GFlops on a cluster with 16 GPUs. The paper presents the optimizations required to achieve this level of performance.

机译：本文介绍了使用CUDA来加速具有GPU的群集上的Himeno基准测试。该实现旨在优化内存带宽利用率。我们的方法在NVIDIA Tesla C1060 GPU上达到了理论峰值带宽的83％以上，并且性能超过50 GFlop。利用MPI和CUDA流将GPU执行与数据传输重叠的多GPU实现允许线性缩放，并在具有16个GPU的集群上以800 GFlop的速度执行。本文介绍了达到此性能水平所需的优化。

著录项

来源
《2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS)》|2010年|P.1-10|共10页
会议地点 Atlanta GA(US);Atlanta GA(US)
作者
Phillips Everett H.; Fatica Massimiliano;
展开▼
作者单位

NVIDIA Corporation, Santa Clara, California, United States;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类 TP311.133;
关键词

相似文献

外文文献
中文文献
专利

1. Accelerating the discontinuous Galerkin method for seismic wave propagation simulations using multiple GPUs with CUDA and MPI [J] . Dawei Mu, Po Chen, Liqiang Wang 地震学报（英文版） . 2013,第006期
2. CUDA-MPI Implementation of Fast Multipole Method on GPU Clusters for Dielectric Objects [J] . Nghia Tran, Tuan Phan, Kilic Ozlem Applied Computational Electromagnetics Society journal . 2018,第2期

机译：快速多极方法在介电对象GPU群集上的CUDA-MPI实现
3. A lightweight BLASTP and its implementation on CUDA GPUs [J] . Liang-Tsung Huang, Kai-Cheng Wei, Chao-Chin Wu, Journal of supercomputing . 2021,第1期

机译：轻量级BLASTP及其在CUDA GPU上的实现
4. Parallel Fast Walsh Transform Algorithm and Its Implementation with CUDA on GPUs [J] . Dusan Bikov, Iliya Bouyukliev Cybernetics and information technologies: CIT . 2017,第5期

机译：并行快速Walsh变换算法及其在GPU上的CUDA实现
5. Implementing the Himeno benchmark with CUDA on GPU clusters [C] . Phillips E.H., Fatica M. 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：在GPU群集上使用CUDA实施Himeno基准测试
6. An MPI-CUDA implementation of a model for calcium induced calcium release in a three-dimensional heart cell on a hybrid CPU/GPU cluster [D] . Huang, Xuan 2015

机译：MPI-CUDA模型在混合CPU / GPU集群上的三维心脏细胞中钙诱导的钙释放的模型实现
7. Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA [O] . Dariusz Mrozek, Miłosz Brożek, Bożena Małysiak-Mrozek -1

机译：使用GPU和CUDA并行实现3D蛋白质结构相似性搜索
8. An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters [O] . Jacobsen Dana A., Thibault Julien C., Senocak Inanc 2010

机译：MPI-CUDA在多GPU群集上大规模并行不可压缩流量计算的实现

Implementing the Himeno benchmark with CUDA on GPU clusters

摘要

著录项

相似文献

相关主题

期刊订阅