首页> 外文期刊>Very Large Scale Integration (VLSI) Systems, IEEE Transactions on >Bank Stealing for a Compact and Efficient Register File Architecture in GPGPU
【24h】

Bank Stealing for a Compact and Efficient Register File Architecture in GPGPU

机译:银行窃取,可在GPGPU中实现紧凑高效的寄存器文件架构

获取原文
获取原文并翻译 | 示例

摘要

Modern general-purpose graphic processing units (GPGPUs) have emerged as pervasive alternatives for parallel high-performance computing. The extreme multithreading in modern GPGPUs demands a large register file (RF), which is typically organized into multiple banks to support the massive parallelism. Although a heavily banked structure benefits RF throughput, its associated area and energy costs with diminishing performance gains greatly limit the future RF scaling. In this paper, we propose an improved RF design with bank stealing techniques, which enable a high RF throughput with compact area. By deeply investigating the GPGPU microarchitecture, we find that the state-of-the-art RF designs' is far from optimal due to the deficiency in bank utilization, which is the intrinsic limitation to a high RF throughput and a compact RF area. We investigate the causes for bank conflicts and identify that most conflicts can be eliminated by leveraging the fact that the highly banked RF oftentimes experiences underutilization. This is especially true in GPGPUs, where multiple ready warps are available at the scheduling stage with their operands to be wisely coordinated. In this paper, we propose two lightweight bank stealing techniques that can opportunistically fill the idle banks and register entries for better operand service. Using the proposed architecture, the average GPGPU performance can be improved under a smaller energy budget with significant area saving, which makes it promising for sustainable RF scaling.
机译:现代通用图形处理单元(GPGPU)逐渐成为并行高性能计算的替代方案。现代GPGPU中的极端多线程需要一个大寄存器文件(RF),该文件通常组织为多个存储体以支持大规模并行处理。尽管大量存储的结构有益于RF吞吐量,但其相关面积和能源成本以及性能增益的下降极大地限制了未来的RF缩放比例。在本文中,我们提出了一种采用银行窃取技术的改进的RF设计,该技术可在紧凑的区域内实现高RF吞吐量。通过对GPGPU微体系结构的深入研究,我们发现,由于存储库利用率不足,最新的RF设计远非最佳,这是对高RF吞吐量和紧凑RF区域的固有限制。我们调查了银行冲突的原因,并确定可以通过利用高度银行化的RF经常遇到未充分利用的事实来消除大多数冲突。这在GPGPU中尤其如此,在GPGPU中,在调度阶段可以使用多个就绪扭曲,并对其操作数进行明智地协调。在本文中,我们提出了两种轻量级的银行窃取技术,它们可以趁机填充空闲的银行并注册条目,以提供更好的操作数服务。使用提出的架构,可以在较小的能量预算下提高GPGPU的平均性能,并节省大量面积,这使其有望实现可持续的RF扩展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号