首页> 外文期刊>Computer architecture news >Warped-Compression: Enabling Power Efficient GPUs through Register Compression
【24h】

Warped-Compression: Enabling Power Efficient GPUs through Register Compression

机译:扭曲压缩:通过寄存器压缩实现高能效GPU

获取原文
获取原文并翻译 | 示例

摘要

This paper presents Warped-Compression, a warp-level register compression scheme for reducing GPU power consumption. This work is motivated by the observation that the register values of threads within the same warp are similar, namely the arithmetic differences between two successive thread registers is small. Removing data redundancy of register values through register compression reduces the effective register width, thereby enabling power reduction opportunities. GPU register files are huge as they are necessary to keep concurrent execution contexts and to enable fast context switching. As a result register file consumes a large fraction of the total GPU chip power. GPU design trends show that the register file size will continue to increase to enable even more thread level parallelism. To reduce register file data redundancy warped-compression uses low-cost and implementation-efficient base-delta-immediate (BDI) compression scheme, that takes advantage of banked register file organization used in GPUs. Since threads within a warp write values with strong similarity, BDI can quickly compress and decompress by selecting either a single register, or one of the register banks, as the primary base and then computing delta values of all the other registers, or banks. Warped-compression can be used to reduce both dynamic and leakage power. By compressing register values, each warp-level register access activates fewer register banks, which leads to reduction in dynamic power. When fewer banks are used to store the register content, leakage power can be reduced by power gating the unused banks. Evaluation results show that register compression saves 25% of the total register file power consumption.
机译:本文提出了Warped-Compression,一种用于减少GPU功耗的扭曲级寄存器压缩方案。这项工作是由于观察到同一经线内的线程的寄存器值相似,即两个连续线程寄存器之间的算术差异很小而引起的。通过寄存器压缩消除寄存器值的数据冗余会减小有效寄存器宽度,从而降低功耗。 GPU寄存器文件非常庞大,因为它们是保留并发执行上下文并启用快速上下文切换所必需的。结果,寄存器文件消耗了总GPU芯片功耗的很大一部分。 GPU设计趋势表明,寄存器文件的大小将继续增加,以实现更多的线程级并行性。为了减少寄存器文件数据冗余,翘曲压缩使用低成本且实现效率高的立即增量基数(BDI)压缩方案,该方案利用了GPU中使用的存储寄存器文件组织的优势。由于翘曲内的线程具有高度相似的写入值,因此BDI可以通过选择单个寄存器或其中一个寄存器库作为主要基础,然后计算所有其他寄存器或库的增量值来快速进行压缩和解压缩。翘曲压缩可用于降低动态功率和泄漏功率。通过压缩寄存器值,每个翘曲级寄存器访问都会激活较少的寄存器组,从而降低动态功耗。当使用更少的存储体来存储寄存器内容时,可以通过对未使用的存储体进行电源门控来减少泄漏功率。评估结果表明,寄存器压缩可节省寄存器文件总功耗的25%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号