首页> 外文会议>Compiler construction. >Automatic Restructuring of GPU Kernels for Exploiting Inter-thread Data Locality

【24h】

Automatic Restructuring of GPU Kernels for Exploiting Inter-thread Data Locality

机译：自动重组GPU内核以利用线程间数据局部性

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Hundreds of cores per chip and support for fine-grain multithreading have made GPUs a central player in today's HPC world. For many applications, however, achieving a high fraction of peak on current GPUs, still requires significant programmer effort. A key consideration for optimizing GPU code is determining a suitable amount of work to be performed by each thread. Thread granularity not only has a direct impact on occupancy but can also influence data locality at the register and shared-memory levels. This paper describes a software framework to analyze dependencies in parallel GPU threads and perform source-level restructuring to obtain GPU kernels with varying thread granularity. The framework supports specification of coarsening factors through source-code annotation and also implements a heuristic based on estimated register pressure that automatically recommends coarsening factors for improved memory performance. We present preliminary experimental results on a select set of CUDA kernels. The results show that the proposed strategy is generally able to select profitable coarsening factors. More importantly, the results demonstrate a clear need for automatic control of thread granularity at the software level for achieving higher performance.

机译：每个芯片数百个内核以及对细粒度多线程的支持使GPU成为当今HPC世界中的核心参与者。但是，对于许多应用程序而言，要在当前GPU上达到很高的峰值，仍然需要大量的程序员工作。优化GPU代码的关键考虑因素是确定每个线程要执行的适当工作量。线程粒度不仅会直接影响占用率，而且还会影响寄存器和共享内存级别的数据局部性。本文介绍了一种软件框架，用于分析并行GPU线程中的依赖性并执行源代码级重构，以获得具有不同线程粒度的GPU内核。该框架支持通过源代码注释指定粗化因子，并基于估计的寄存器压力实现启发式算法，该算法自动推荐粗化因子以提高内存性能。我们介绍了一组精选的CUDA内核的初步实验结果。结果表明，提出的策略通常能够选择有利可图的粗化因子。更重要的是，结果表明，对于在软件级别自动控制线程粒度以实现更高性能的明确需求。

著录项

来源
《Compiler construction.》|2012年|p.21-40|共20页
会议地点 Tallinn(EE);Tallinn(EE);Tallinn(EE);Tallinn(EE)
作者
Swapneela Unkule; Christopher Shaltz; Apan Qasem;
展开▼
作者单位

Texas State University, San Marcos, TX 78666, USA;

Texas State University, San Marcos, TX 78666, USA;

Texas State University, San Marcos, TX 78666, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机软件;计算机软件;
关键词

相似文献

外文文献
中文文献
专利

1. Analyzing data locality in GPU kernels using memory footprint analysis [J] . Kiani Mohsen, Rajabzadeh Amir Simulation modelling practice and theory: International journal of the Federation of European Simulation Societies . 2019,第期

机译：使用内存占用分析分析GPU内核中的数据局部
2. Logging Inter-Thread Data Dependencies in Linux Kernel [J] . Takafumi KUBOTA, Naohiro AOTA, Kenji KONO IEICE transactions on information and systems . 2020,第7期

机译：在Linux内核中记录线程间数据依赖项
3. InK-Compact: In-Kernel Stream Compaction and Its Application to Multi-Kernel Data Visualization on General-Purpose GPUs [J] . D. M. Hughes, I. S. Lim, M. W. Jones, Computer Graphics Forum: Journal of the European Association for Computer Graphics . 2013,第6期

机译：InK-Compact：内核流压缩及其在通用GPU上的多内核数据可视化中的应用
4. Automatically exploiting implicit Pipeline Parallelism from multiple dependent kernels for GPUs [C] . Gwangsun Kim, Jiyun Jeong, John Kim, Proceedings of the 2016 International Conference on Parallel Architectures and Compilation . 2016

机译：从多个依赖内核为GPU自动利用隐式管道并行性
5. On implementation and optimization of large-data scientific kernels on multicore processors and GPUs [D] . Hakeem, Mohammad Umar 2013

机译：在多核处理器和GPU上实现和优化大数据科学内核
6. Exploit fully automatic low-level segmented PET data for training high-level deep learning algorithms for the corresponding CT data [O] . Christina Gsaxner, Peter M. Roth, Jürgen Wallner, -1

机译：利用全自动的低级分段PET数据来训练相应CT数据的高级深度学习算法
7. Automatic Restructuring of GPU Kernels for Exploiting Inter-thread Data Locality [O] . Swapneela Unkule, Christopher Shaltz, Apan Qasem 2012

机译：用于利用线程间数据局部地区GPU内核的自动重组

Automatic Restructuring of GPU Kernels for Exploiting Inter-thread Data Locality

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅