首页> 外文会议>IEEE Computer Society Annual Symposium on VLSI >Dynamic Per-Warp Reconvergence Stack for Efficient Control Flow Handling in GPUs
【24h】

Dynamic Per-Warp Reconvergence Stack for Efficient Control Flow Handling in GPUs

机译:动态的逐周期重新收敛堆栈,可在GPU中进行高效控制流处理

获取原文

摘要

GPGPUs usually experience performance degradation when the control flow of threads diverges in a warp. Reconvergence stack based control flow handling scheme is widely adopted in GPU architectures. The depth of such stack is always set to a large number, so that there can be enough entries for warps experiencing nested branches. However, for warps experiencing simple branches or even no branches, those deep reconvergence stacks would stay idle, causing a serious waste of hardware resource. Moreover, with the development of GPU architectures, more and more warps will be deployed on a GPU stream processor core, such problem could be even more serious. To solve this problem, this paper propose a dynamic reconvergence stack structure, in which a stack pool is shared by all the warps, and dynamic stacks of different warps can be constructed according to the run-time requirement. This can satisfy the stack requirement while eliminating unnecessary waste of hardware resource. Our experiments show that the dynamic reconvergence stack can reduce the cost of stack by 50% with the conventional performance well maintained.
机译:当线程的控制流在一次扭曲中分叉时,GPGPU通常会遇到性能下降的情况。基于再收敛堆栈的控制流处理方案已在GPU架构中广泛采用。此类堆栈的深度始终设置为较大,以便可以有足够的条目供扭曲的嵌套分支使用。但是,对于经历简单分支甚至没有分支的经线,那些深度重新聚合堆栈将保持空闲状态,从而导致严重浪费硬件资源。而且,随着GPU架构的发展,越来越多的扭曲将被部署在GPU流处理器核心上,这样的问题可能会变得更加严重。为了解决这个问题,本文提出了一种动态再收敛堆栈结构,其中所有池都共享一个堆栈池,并且可以根据运行时的要求构造不同的经轴动态堆栈。这样既可以满足堆栈要求,又可以避免不必要的硬件资源浪费。我们的实验表明,动态重新收敛堆栈可以在保持常规性能的前提下将堆栈成本降低50%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号