首页> 外文期刊>Parallel Computing >A complexity-effective microprocessor design with decoupled dispatch queues and prefetching
【24h】

A complexity-effective microprocessor design with decoupled dispatch queues and prefetching

机译:具有分离的调度队列和预取功能的复杂性有效的微处理器设计

获取原文
获取原文并翻译 | 示例

摘要

Continuing demands for high degrees of Instruction Level Parallelism (ILP) require large dispatch queues (or centralized reservation stations) in modern superscalar microprocessors. However, such large dispatch queues are inevitably accompanied by high circuit complexity which would correspondingly limit the pipeline clock rates. In other words, increasing the size of the dispatch queue ultimately hinders attempts at increasing the clock speed. This is due to the fact that most of today's designs are based upon a centralized dispatch queue which itself depends on globally broadcasting operations to wakeup and select the ready instructions. As an alternative to this conventional design, we propose the design of hierarchically distributed dispatch queues, based on the access/execute decoupled architectures.rnSimulation results based on 14 data intensive benchmarks show that while our DDQ (Decoupled Dispatch Queues) design achieves levels of performance which are comparable to what would be obtained in a superscalar machine with a large dispatch queue, our approach can be designed with small, distributed dispatch queues which consequently can be implemented with low hardware complexity and high clock rates.
机译:对高水平指令级并行性(ILP)的持续需求要求现代超标量微处理器中的大型调度队列(或集中式预留站)。然而,这样的大调度队列不可避免地伴随着高电路复杂性,这将相应地限制流水线时钟速率。换句话说,增加调度队列的大小最终会阻碍尝试提高时钟速度。这是由于以下事实:当今大多数设计都基于集中式调度队列,而该调度队列本身依赖于全局广播操作来唤醒和选择就绪指令。作为此常规设计的替代方案,我们提出了基于访问/执行解耦架构的分层分布式调度队列的设计。rn基于14个数据密集型基准的仿真结果表明,尽管我们的DDQ(解耦调度队列)设计达到了性能水平与在具有大调度队列的超标量机器中获得的性能相比,我们的方法可以设计为具有较小的分布式调度队列,因此可以以较低的硬件复杂度和较高的时钟速率实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号