Hierarchical Roofline analysis for GPUs: Accelerating Performance optimization for the NERSC-9 Perlmutter system

Yang Charlene; Kurth Thorsten; Williams Samuel

首页> 外文期刊>Concurrency, practice and experience >Hierarchical Roofline analysis for GPUs: Accelerating Performance optimization for the NERSC-9 Perlmutter system

【24h】

Hierarchical Roofline analysis for GPUs: Accelerating Performance optimization for the NERSC-9 Perlmutter system

机译：GPU的分层屋顶分析：加速NERSC-9 Perlmuter系统的性能优化

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Roofline performance model provides an intuitive and insightful approach to identifying performance bottlenecks and guiding performance optimization. In preparation for the next-generation supercomputer Perlmutter at NERSC, this paper presents a methodology to construct a hierarchical Roofline on NVIDIA GPUs and extends it to support reduced precision and Tensor Cores. The hierarchical Roofline incorporates L1, L2, device memory, and system memory bandwidths into one single figure, and it offers more profound insights into performance analysis than the traditional DRAM-only Roofline. We use our Roofline methodology to analyze three proxy applications: GPP from BerkeleyGW, HPGMG from AMReX, and conv2d from TensorFlow. In doing so, we demonstrate the ability of our methodology to readily understand various aspects of performance and performance bottlenecks on NVIDIA GPUs and motivate code optimizations.

机译：屋顶性能模型提供了一种直观和富有洞察力的方法来识别性能瓶颈和指导性能优化。为了准备NERSC的下一代超级计算机Perlmuter，本文提出了一种在NVIDIA GPU上构建分层屋顶线的方法，并延伸它以支持降低的精度和张量核心。分层屋顶线包含L1，L2，器件存储器和系统内存带宽进入一个图形，它提供比传统的DRAM-ONDEL线条更深刻的性能分析。我们使用我们的屋顶线方法来分析三个代理应用程序：来自伯克利格的GPP来自AMREX的HPGMG，以及来自Tensorflow的Conv2d。在这样做时，我们展示了我们的方法能够容易理解NVIDIA GPU上的性能和性能瓶颈的各个方面，并激励代码优化。

著录项

来源
《Concurrency, practice and experience》 |2020年第20期|e5547.1-e5547.12|共12页
作者
Yang Charlene; Kurth Thorsten; Williams Samuel;
展开▼
作者单位

Lawrence Berkeley Natl Lab Natl Energy Res Sci Comp Ctr NERSC Berkeley CA 94720 USA;

Lawrence Berkeley Natl Lab Natl Energy Res Sci Comp Ctr NERSC Berkeley CA 94720 USA;

Lawrence Berkeley Natl Lab CRD Berkeley CA 94720 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
code optimization; Cray; NVIDIA GPU; performance analysis; Roofline; tensor core;

机译：代码优化;CRAY;NVIDIA GPU;性能分析;屋顶线;张芯;

相似文献

外文文献
中文文献
专利

1. Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model [J] . Kim K.-H., Kim K., Park Q.-H. Computer physics communications . 2011,第6期

机译：基于车顶线模型的GPU三维FDTD性能分析和优化
2. FPGA-based tsunami simulation: Performance comparison with GPUs, and roofline model for scalability analysis [J] . Kohei Nagasu, Kentaro Sano, Fumiya Kono, Journal of Parallel and Distributed Computing . 2017,第auga期

机译：基于FPGA的海啸仿真：与GPU的性能比较以及用于可扩展性分析的Roofline模型
3. Roofline analysis with Cray performance analysis tools (CrayPat) and roofline-based performance projections for a future architecture [J] . JaeHyuk Kwack, Galen Arnold, CelsoMendes, Concurrency, practice and experience . 2019,第16期

机译：使用Cray性能分析工具（CrayPat）进行屋顶线分析，以及基于屋顶线的性能预测以用于未来的体系结构
4. Cache-Aware Roofline Model and Medical Image Processing Optimizations in GPUs [C] . Estefania Serrano, Aleksandar Ilic, Leonel Sousa, International conference on high performance computing workshops . 2018

机译：GPU中的缓存感知车顶线模型和医学图像处理优化
5. A Normalized Particle Swarm Optimization Algorithm to Price Complex Chooser Option and Accelerating its Performance with GPU. [D] . Sharma, Bhanu Pratap. 2012

机译：为复杂选择器价格定价并使用GPU加速其性能的归一化粒子群优化算法。
6. Application Performance Analysis and Efficient Execution on Systems with multi-core CPUs GPUs and MICs: A Case Study with Microscopy Image Analysis [O] . George Teodoro, Tahsin Kurc, Guilherme Andrade, -1

机译：具有多核CPUGPU和MIC的系统上的应用程序性能分析和高效执行：以显微镜图像分析为例
7. Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system [O] . Charlene Yang, Thorsten Kurth, Samuel Williams 2019

机译：GPU的分层屋顶分析：加速NERSC-9 Perlmuter系统的性能优化

Hierarchical Roofline analysis for GPUs: Accelerating Performance optimization for the NERSC-9 Perlmutter system

摘要

著录项

相似文献

相关主题

期刊订阅