Dynamic Per-Warp Reconvergence Stack for Efficient Control Flow Handling in GPUs

机译：动态的逐周期重新收敛堆栈，可在GPU中进行高效控制流处理

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

GPGPUs usually experience performance degradation when the control flow of threads diverges in a warp. Reconvergence stack based control flow handling scheme is widely adopted in GPU architectures. The depth of such stack is always set to a large number, so that there can be enough entries for warps experiencing nested branches. However, for warps experiencing simple branches or even no branches, those deep reconvergence stacks would stay idle, causing a serious waste of hardware resource. Moreover, with the development of GPU architectures, more and more warps will be deployed on a GPU stream processor core, such problem could be even more serious. To solve this problem, this paper propose a dynamic reconvergence stack structure, in which a stack pool is shared by all the warps, and dynamic stacks of different warps can be constructed according to the run-time requirement. This can satisfy the stack requirement while eliminating unnecessary waste of hardware resource. Our experiments show that the dynamic reconvergence stack can reduce the cost of stack by 50% with the conventional performance well maintained.

机译：当线程的控制流在一次扭曲中分叉时，GPGPU通常会遇到性能下降的情况。基于再收敛堆栈的控制流处理方案已在GPU架构中广泛采用。此类堆栈的深度始终设置为较大，以便可以有足够的条目供扭曲的嵌套分支使用。但是，对于经历简单分支甚至没有分支的经线，那些深度重新聚合堆栈将保持空闲状态，从而导致严重浪费硬件资源。而且，随着GPU架构的发展，越来越多的扭曲将被部署在GPU流处理器核心上，这样的问题可能会变得更加严重。为了解决这个问题，本文提出了一种动态再收敛堆栈结构，其中所有池都共享一个堆栈池，并且可以根据运行时的要求构造不同的经轴动态堆栈。这样既可以满足堆栈要求，又可以避免不必要的硬件资源浪费。我们的实验表明，动态重新收敛堆栈可以在保持常规性能的前提下将堆栈成本降低50％。

著录项

来源
《IEEE Computer Society Annual Symposium on VLSI》|2016年|176-181|共6页
会议地点
作者
Yaohua Wang; Xiaowen Chen; Dong Wang; Sheng Liu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Very large scale integration; Hardware;

机译：超大规模集成;硬件;

相似文献

外文文献
中文文献
专利

1. GPU based parallel genetic algorithm for solving an energy efficient dynamic flexible flow shop scheduling problem [J] . Jia Luo, Shigeru Fujimura, Didier El Baz, Journal of Parallel and Distributed Computing . 2019,第Nova期

机译：基于GPU的并行遗传算法解决节能高效的动态柔性流水车间调度问题
2. A Dynamic Control Mechanism of Interrupt Stack Overflow on Real-Time Embedded Monitor (REMON) [J] . SHIGEKI NANKAKU, HIROYUKI KAWAKAMI, HISAO KOIZUMI, Electronics and communications in Japan . 2015,第3期

机译：实时嵌入式监视器（REMON）上的中断堆栈溢出的动态控制机制
3. CRISPRi-Based Dynamic Control of Carbon Flow for Efficient N-Acetyl Glucosamine Production and Its Metabolomic Effects in Escherichia coli [J] . Zhang Quanwei, Hou Zhengjie, Ma Qian, Journal of Agricultural and Food Chemistry . 2020,第10期

机译：基于CRISPRI的碳流动态控制，以高效的N-乙酰吡糖胺产生及其在大肠杆菌中的代谢组成
4. Dynamic Per-Warp Reconvergence Stack for Efficient Control Flow Handling in GPUs [C] . Yaohua Wang, Xiaowen Chen, Dong Wang, IEEE Computer Society Annual Symposium on VLSI . 2016

机译：动态每次翘曲重新转换堆栈，用于GPU中有效控制流量处理
5. Architectural and Runtime Enhancements for Dynamically Controlled Multi-Level Concurrency on GPUs. [D] . Ukidave, Yash. 2015

机译：在GPU上实现动态控制的多层并发的体系结构和运行时增强。
6. A biomolecular electrostatics solver using Python GPUs and boundaryelements that can handle solvent-filled cavities and Sternlayers [O] . Christopher D. Cooper, Jaydeep P. Bardhan, L. A. Barba -1

机译：使用PythonGPU和边界的生物分子静电求解器可处理溶剂填充型腔和斯特恩的元件层数
7. Control Flow Optimization Via Dynamic Reconvergence Prediction [O] . Jamison D. Collins £ý, Dean M. Tullsen, Hong Wang Ý 2008

机译：通过动态收敛预测优化控制流
8. Multidimensional Dataflow Graph Modeling and Mapping for Efficient GPU Implementation. [R] . Wang, L., Shen, C., Seetharaman, G., 2012

机译：用于高效GpU实现的多维数据流图建模和映射。

Dynamic Per-Warp Reconvergence Stack for Efficient Control Flow Handling in GPUs

摘要

著录项

相似文献

相关主题

期刊订阅