首页> 外文期刊>Concurrency, practice and experience >Hierarchical Roofline analysis for GPUs: Accelerating Performance optimization for the NERSC-9 Perlmutter system
【24h】

Hierarchical Roofline analysis for GPUs: Accelerating Performance optimization for the NERSC-9 Perlmutter system

机译:GPU的分层屋顶分析:加速NERSC-9 Perlmuter系统的性能优化

获取原文
获取原文并翻译 | 示例
           

摘要

The Roofline performance model provides an intuitive and insightful approach to identifying performance bottlenecks and guiding performance optimization. In preparation for the next-generation supercomputer Perlmutter at NERSC, this paper presents a methodology to construct a hierarchical Roofline on NVIDIA GPUs and extends it to support reduced precision and Tensor Cores. The hierarchical Roofline incorporates L1, L2, device memory, and system memory bandwidths into one single figure, and it offers more profound insights into performance analysis than the traditional DRAM-only Roofline. We use our Roofline methodology to analyze three proxy applications: GPP from BerkeleyGW, HPGMG from AMReX, and conv2d from TensorFlow. In doing so, we demonstrate the ability of our methodology to readily understand various aspects of performance and performance bottlenecks on NVIDIA GPUs and motivate code optimizations.
机译:屋顶性能模型提供了一种直观和富有洞察力的方法来识别性能瓶颈和指导性能优化。为了准备NERSC的下一代超级计算机Perlmuter,本文提出了一种在NVIDIA GPU上构建分层屋顶线的方法,并延伸它以支持降低的精度和张量核心。分层屋顶线包含L1,L2,器件存储器和系统内存带宽进入一个图形,它提供比传统的DRAM-ONDEL线条更深刻的性能分析。我们使用我们的屋顶线方法来分析三个代理应用程序:来自伯克利格的GPP来自AMREX的HPGMG,以及来自Tensorflow的Conv2d。在这样做时,我们展示了我们的方法能够容易理解NVIDIA GPU上的性能和性能瓶颈的各个方面,并激励代码优化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号