【24h】

Identifying Optimization Opportunities Within Kernel Execution in GPU Codes

机译:在GPU代码中确定内核执行中的优化机会

获取原文

摘要

Tuning codes for GPGPU architectures is challenging because few performance tools can pinpoint the exact causes of execution bottlenecks. While profiling applications can reveal execution behavior with a particular architecture, the abundance of collected information can also overwhelm the user. Moreover, performance counters provide cumulative values but does not attribute events to code regions, which makes identifying performance hot spots difficult. This research focuses on characterizing the behavior of GPU application kernels and its performance at the node level by providing a visualization and metrics display that indicates the behavior of the application with respect to the underlying architecture. We demonstrate the effectiveness of our techniques with LAMMPS and LULESH application case studies on a variety of GPU architectures. By sampling instruction mixes for kernel execution runs, we reveal a variety of intrinsic program characteristics relating to computation, memory and control flow.
机译:由于很少有性能工具可以查明执行瓶颈的确切原因,因此针对GPGPU架构的优化代码具有挑战性。尽管对应用程序进行性能分析可以揭示特定体系结构的执行行为,但是大量收集的信息也可能使用户不知所措。此外,性能计数器提供累积值,但不将事件归因于代码区域,这使识别性能热点变得困难。这项研究致力于通过提供可视化和指标显示来指示应用程序相对于底层体系结构的行为,从而表征GPU应用程序内核的行为及其在节点级别的性能。我们通过LAMMPS和LULESH应用案例研究在各种GPU架构上证明了我们技术的有效性。通过采样指令混合以进行内核执行,我们揭示了与计算,内存和控制流有关的各种内在程序特征。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号