首页> 外文期刊>Microprocessors and microsystems >Register port complexity reduction in wide-issue processors with selective instruction execution
【24h】

Register port complexity reduction in wide-issue processors with selective instruction execution

机译:通过有选择的指令执行,可减少大型处理器中的寄存器端口复杂度

获取原文
获取原文并翻译 | 示例

摘要

As the width of the processor grows, complexity of a register file (RF) with multiple ports grows more than linearly and leads to larger register access time and higher power consumption. Analysis of SPEC2000 programs reveals that only a small portion of the instructions in a program (16% in integer and 38% in floating-point) require both the source operands. Also, when the programs are executed in an 8-wide processor only a very few (two or less) two-source instructions are executed in a cycle for a significant portion of time (more than 98% for integer and 93% for floating-point), leading to a significant under-utilization of register port bandwidth. In this paper, we propose a novel technique to significantly reduce the number of register ports, with a very minor modification in the select logic to issue only a limited number of two-source instructions each cycle. This is achieved with no significant impact on processor's overall performance. The novelty of the technique is that it is easy to implement and succeeds in reducing the access time, power, and area of the register file, without aggravating these factors in any other logic on the chip. With this technique in an 8-wide processor, as compared to a conventional 128-entry RF with 16 read ports, for integer programs a register file can be designed with 11 or 10 read ports as these configurations result in instructions per cycle (IPC) degradation of only 0.929% and 3.38%, respectively. This significantly low degradation in IPC is achieved while reducing the register access time by 9% and 12%, respectively, and reducing power by 35% and 50%, respectively. For FP programs, a register file can be designed with 12 read ports (1.16% IPC loss, 8% less access time, and 28% less power) or with 11 read ports (3.5% IPC loss, 9% less access time, and 35% less power). The paper analyzes the performance of all the possible flavors of the proposed technique for register file in both 4-wide and 8-wide processors, and presents a choice of the performance and register port complexity combination to the designer.
机译:随着处理器宽度的增加,具有多个端口的寄存器文件(RF)的复杂性将线性增加,并导致更长的寄存器访问时间和更高的功耗。对SPEC2000程序的分析表明,程序中只有一小部分指令(整数为16%,浮点为38%)都需要两个源操作数。同样,当程序在8位宽处理器中执行时,在相当长的一段时间内,一个周期中仅执行很少(两个或更少)两个源指令(对于整数,大于98%,对于浮点运算,则为93%)。点),导致寄存器端口带宽的严重利用不足。在本文中,我们提出了一种新颖的技术来显着减少寄存器端口的数量,并对选择逻辑进行了非常小的修改,以便每个周期仅发出有限数量的两源指令。实现这一点不会对处理器的整体性能产生重大影响。该技术的新颖之处在于易于实现,并且成功地减少了寄存器文件的访问时间,功耗和面积,而不会在芯片上的任何其他逻辑中加剧这些因素。与具有16个读取端口的常规128条目RF相比,在8宽处理器中使用此技术,对于整数程序,可以将寄存器文件设计为具有11个或10个读取端口,因为这些配置导致每个周期的指令(IPC)退化分别只有0.929%和3.38%。 IPC的这种显着降低,同时将寄存器访问时间分别减少了9%和12%,并将功耗分别减少了35%和50%。对于FP程序,可以将寄存器文件设计为具有12个读取端口(IPC丢失1.16%,访问时间减少8%,功耗减少28%)或11个读取端口(IPC丢失3.5%,访问时间减少9%,以及功耗降低35%)。本文分析了在4宽和8宽处理器中所提出的寄存器文件技术的所有可能特性的性能,并向设计人员提出了性能和寄存器端口复杂度组合的选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号