首页> 外文会议>2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) >Implementing the Himeno benchmark with CUDA on GPU clusters
【24h】

Implementing the Himeno benchmark with CUDA on GPU clusters

机译:在GPU群集上使用CUDA实施Himeno基准测试

获取原文
获取原文并翻译 | 示例

摘要

This paper describes the use of CUDA to accelerate the Himeno benchmark on clusters with GPUs. The implementation is designed to optimize memory bandwidth utilization. Our approach achieves over 83% of the theoretical peak bandwidth on a NVIDIA Tesla C1060 GPU and performs at over 50 GFlops. A multi-GPU implementation that utilizes MPI alongside CUDA streams to overlap GPU execution with data transfers allows linear scaling and performs at over 800 GFlops on a cluster with 16 GPUs. The paper presents the optimizations required to achieve this level of performance.
机译:本文介绍了使用CUDA来加速具有GPU的群集上的Himeno基准测试。该实现旨在优化内存带宽利用率。我们的方法在NVIDIA Tesla C1060 GPU上达到了理论峰值带宽的83%以上,并且性能超过50 GFlop。利用MPI和CUDA流将GPU执行与数据传输重叠的多GPU实现允许线性缩放,并在具有16个GPU的集群上以800 GFlop的速度执行。本文介绍了达到此性能水平所需的优化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号