首页> 外文学位 >Achieving high performance and energy efficiency in superpipelined processors.
【24h】

Achieving high performance and energy efficiency in superpipelined processors.

机译:在超流水线处理器中实现高性能和高能效。

获取原文
获取原文并翻译 | 示例

摘要

One approach to exploring instruction-level parallelism is superpipelining which uses deep pipelines to achieve high clock rates. Pipeline hazards, memory latency, and power consumption are three vital factors that limit the benefits of superpipelining. This dissertation presents several novel approaches to achieve high-performance and energy-efficient superpipelined microprocessors. These approaches focus mainly on reducing pipeline stalls, memory latency, and energy consumption of unnecesary bit switches.;To reduce the number of pipeline stalls, an optimizing instruction scheduler, named Super-reorderer, was built in which in-block scheduling and cross-block scheduling are applied to minimize the number of data and structural hazards. A novel branch scheme is proposed, called branch with masked squashing, to minimize the number of control hazards. The basic idea of branch with masked squashing is to fill delay slots with safe instructions which may come before or after the branch. For the remaining unfilled delay slots, instructions from the predicted target path are used to fill the delay slots. In the case of misprediction, only unsafe instructions are annulled. The safe instructions in branch delay slots are always executed.;To reduce memory latency, unconventional cache mapping functions, hardware-controlled instruction prefetching, and software-controlled data prefetching techniques are investigated. Two novel unconventional cache mapping functions: bit-flipping and segmented bit-selection are proposed and evaluated. A direct-mapped cache with these unconventional cache mapping functions can achieve high hit rates, while maintaining a hit time as fast as a direct-mapped cache with traditional mapping. A novel technique for software-controlled data prefetching is proposed in which the starting data in a data region of a working set is prefetched by software and the subsequence data in the data region is prefetched by hardware. One of the limitations of the software-controlled data prefetching techniques is the execution overhead caused by prefetch instructions. A novel instruction set is proposed in which non-memory-access operations are combined with an optional prefetch operation to effectively eliminate the execution overhead caused by a prefetch instruction. A novel hardware-controlled instruction prefetching technique, called branch correlation-based cache prefetching (BCCP), is proposed. The BCCP, which takes advantage of high branch prediction accuracies of correlation-based cache prediction and aggressive cache line look ahead prefetching, is able to effectively hide long instruction cache latency.;To reduce energy consumption in a modern instruction set processor, several novel hardware and software techniques are investigated. A software technique, called Cold Scheduling, is proposed to reduce energy consumption in the control path. The basic idea is to apply compilation techniques to reorder instruction sequences such that the amount of bit switching on the control path is minimal during program execution. Dynamic power management, which automatically shuts down power consumption in unused functional units during program execution, is investigated to reduce energy consumption in the data path. Two novel cache design techniques are proposed, namely Gray code addressing and cache partitioning, to reduce energy consumption in the caches. The idea of the Gray code addressing is to minimize the bit switches on address buses and I/O pads which usually consume a significant amount of energy in the caches. The idea of cache partitioning is to minimize average energy consumption in each cache access by vertically or horizontally partitioning cache memory cell arrays. (Abstract shortened by UMI.)
机译:探索指令级并行性的一种方法是超级流水线,它使用深流水线来实现高时钟速率。流水线危害,内存延迟和功耗是限制超级流水线优势的三个重要因素。本文提出了几种新颖的方法来实现高性能和高能效的超流水线微处理器。这些方法主要集中在减少流水线停顿,存储器等待时间和不必要的位开关的能量消耗上。为了减少流水线停顿的数量,构建了一个优化的指令调度程序,称为超级重排序程序,在其中进行了块内调度和交叉调度。应用块调度可最大程度地减少数据和结构危害的数量。提出了一种新的分支方案,称为带屏蔽挤压的分支,以最大程度地减少控制危害的数量。分支屏蔽掩蔽的基本思想是用安全指令填充延迟时隙,这些指令可能出现在分支之前或之后。对于剩余的未填充延迟时隙,使用来自预测目标路径的指令来填充延迟时隙。在错误预测的情况下,仅取消不安全的指令。为了减少存储器等待时间,研究了非常规的缓存映射功能,硬件控制的指令预取和软件控制的数据预取技术。提出并评估了两种新颖的非常规缓存映射功能:位翻转和分段位选择。具有这些非常规缓存映射功能的直接映射缓存可以实现较高的命中率,同时保持与传统映射的直接映射缓存一样快的命中时间。提出了一种用于软件控制的数据预取的新技术,其中,通过软件预取工作集的数据区域中的起始数据,并且通过硬件预取数据区域中的子序列数据。软件控制的数据预取技术的局限性之一是由预取指令引起的执行开销。提出了一种新颖的指令集,其中非存储器访问操作与可选的预取操作相结合以有效消除由预取指令引起的执行开销。提出了一种新的硬件控制指令预取技术,称为基于分支相关的缓存预取(BCCP)。 BCCP利用基于相关的高速缓存预测的高分支预测准确性和积极的高速缓存行提前预取功能,能够有效地隐藏较长的指令高速缓存等待时间。为了减少现代指令集处理器的能耗,一些新颖的硬件和软件技术进行了研究。提出了一种称为冷调度的软件技术,以减少控制路径中的能耗。基本思想是将编译技术应用于指令序列的重新排序,以使程序执行期间控制路径上的位切换量最小。研究了动态电源管理,该功能可在程序执行期间自动关闭未使用功能单元中的功耗,以减少数据路径中的能耗。提出了两种新颖的缓存设计技术,即格雷码寻址和缓存分区,以减少缓存中的能耗。格雷码寻址的想法是最大程度地减少地址总线和I / O焊盘上的位切换,这些位总线通常会在高速缓存中消耗大量能量。高速缓存分区的思想是通过垂直或水平分区高速缓存存储单元阵列来最大程度地减少每次高速缓存访​​问中的平均能耗。 (摘要由UMI缩短。)

著录项

  • 作者

    Su, Ching-Long Jim.;

  • 作者单位

    University of Southern California.;

  • 授予单位 University of Southern California.;
  • 学科 Engineering Electronics and Electrical.;Computer Science.
  • 学位 Ph.D.
  • 年度 1995
  • 页码 340 p.
  • 总页数 340
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号