Identifying Optimization Opportunities Within Kernel Execution in GPU Codes

页面导航

摘要
著录项
相似文献
相关主题

摘要

Tuning codes for GPGPU architectures is challenging because few performance tools can pinpoint the exact causes of execution bottlenecks. While profiling applications can reveal execution behavior with a particular architecture, the abundance of collected information can also overwhelm the user. Moreover, performance counters provide cumulative values but does not attribute events to code regions, which makes identifying performance hot spots difficult. This research focuses on characterizing the behavior of GPU application kernels and its performance at the node level by providing a visualization and metrics display that indicates the behavior of the application with respect to the underlying architecture. We demonstrate the effectiveness of our techniques with LAMMPS and LULESH application case studies on a variety of GPU architectures. By sampling instruction mixes for kernel execution runs, we reveal a variety of intrinsic program characteristics relating to computation, memory and control flow.

机译：由于很少有性能工具可以查明执行瓶颈的确切原因，因此针对GPGPU架构的优化代码具有挑战性。尽管对应用程序进行性能分析可以揭示特定体系结构的执行行为，但是大量收集的信息也可能使用户不知所措。此外，性能计数器提供累积值，但不将事件归因于代码区域，这使识别性能热点变得困难。这项研究致力于通过提供可视化和指标显示来指示应用程序相对于底层体系结构的行为，从而表征GPU应用程序内核的行为及其在节点级别的性能。我们通过LAMMPS和LULESH应用案例研究在各种GPU架构上证明了我们技术的有效性。通过采样指令混合以进行内核执行，我们揭示了与计算，内存和控制流有关的各种内在程序特征。

著录项

来源
《Workshop on big data management in clouds;Euro-Par 2015 International workshops;Workshop on parallel and distributed computing education for undergraduate students;Workshop on algorithms, models, and tools for parallel computing on heterogeneous platforms;Workshop on large-scale distributed virtual environments;Workshop on on-chip memory hierarchies and interconnects: organization, management and implementation;Workshop on parallel distributed agent-based simulations;Workshop on performance engineering for large-scale graph analytics;Workshop on reproducibility in parallel computing;Workshop on resiliency in high-performance computing with clouds, grids, and clusters;Workshop on runtime and operating systems for the many-core era;Workshop on unconventional high performance computing;Workshop on virtualization in high-performance cloud computing》|2015年|185-196|共12页
会议地点 Vienna(AT)
作者
Robert Lim; Allen Malony; Boyana Norris; Nick Chaimov;
展开▼
作者单位

Performance Research Laboratory High-Performance Computing Laboratory University of Oregon Eugene OR USA;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Using machine learning techniques to analyze the performance of concurrent kernel execution on GPUs [J] . Pablo Carvalho, Esteban Clua, Aline Paes, Future generation computer systems . 2020,第Deca期

机译：使用机器学习技术分析GPU上并发内核执行的性能
2. Countering unauthorized code execution on commodity kernels: A survey of common interfaces allowing kernel code modification [J] . Trent Jaeger, Paul C. van Oorschot, Glenn Wurster Computers & Security . 2011,第8期

机译：反对在商品内核上执行未经授权的代码：概述允许修改内核代码的通用接口
3. Kernel Tuner: A search-optimizing GPU code auto-tuner [J] . van Werkhoven Ben Future generation computer systems . 2019,第JANa期

机译：内核调整器：一种搜索优化的GPU代码自动调整器
4. Identifying Optimization Opportunities Within Kernel Execution in GPU Codes [C] . Robert Lim, Allen Malony, Boyana Norris, International Conference on Parallel and Distributed Computing . 2015

机译：在GPU代码中识别内核执行中的优化机会
5. Throughput Optimization and Resource Allocation on GPUs Under Multi-Application Execution [D] . Punyala, Srinivasa Reddy. 2017

机译：多应用程序执行下GPU上的吞吐量优化和资源分配
6. Next-generation acceleration and code optimization for light transport in turbid media using GPUs [O] . Erik Alerstam, William Chun Yip Lo, Tianyi David Han, 2010

机译：下一代加速和代码优化使用GPU在混浊的介质中传输
7. GPU code optimization using abstract kernel emulation and sensitivity analysis [O] . Changwan Hong, Aravind Sukumaran-Rajam, Jinsung Kim, 2018

机译：GPU代码优化使用抽象内核仿真和敏感性分析

Identifying Optimization Opportunities Within Kernel Execution in GPU Codes

摘要

著录项

相似文献

相关主题

期刊订阅