首页> 外文期刊>The Computer journal >Instruction level parallelism through microthreading - A scalable approach to chip multiprocessors
【24h】

Instruction level parallelism through microthreading - A scalable approach to chip multiprocessors

机译:通过微线程进行指令级并行处理-一种可扩展的芯片多处理器方法

获取原文
获取原文并翻译 | 示例
           

摘要

Most microprocessor chips today use an out-of-order instruction execution mechanism. This mechanism allows superscalar processors to extract reasonably high levels of instruction level parallelism (ILP). The most significant problem with this approach is a large instruction window and the logic to support instruction issue from it. This includes generating wake-up signals to waiting instructions and a selection mechanism for issuing them. Wide-issue width also requires a large multi-ported register file, so that each instruction can read and write its operands simultaneously. Neither structure scales well with issue width leading to poor performance relative to the gates used. Furthermore, to obtain this ILP, the execution of instructions must proceed speculatively. An alternative, which avoids this complexity in instruction issue and eliminates speculative execution, is the microthreaded model. This model fragments sequential code at compile time and executes the fragments out of order while maintaining in-order execution within the fragments. The only constraints on the execution of fragments are the dependencies between them, which are managed in a distributed and scalable manner using synchronizing registers. The fragments of code are called microthreads and they capture ILP and loop concurrency. Fragments can be interleaved on a single processor to give tolerance to latency in operands or distributed to many processors to achieve speedup. The implementation of this model is fully scalable. It supports distributed instruction issue and a fully scalable register file, which implements a distributed, shared-register model of communication and synchronization between multiple processors on a single chip. This paper introduces the model, compares it with current approaches and presents an analysis of some of the implementation issues. It also presents results showing scalable performance with issue width over several orders of magnitude, from the same binary code.
机译:今天,大多数微处理器芯片都使用乱序指令执行机制。该机制允许超标量处理器提取合理水平的指令级并行性(ILP)。这种方法最重要的问题是较大的指令窗口以及支持从中发出指令的逻辑。这包括生成等待指令的唤醒信号和发出指令的选择机制。较宽的宽度还需要一个大型的多端口寄存器文件,以便每个指令可以同时读取和写入其操作数。相对于所使用的门,这两种结构都不能很好地按比例缩小尺寸,从而导致性能不佳。此外,要获得此ILP,必须以推测方式执行指令。一种避免在指令发布中如此复杂并消除推测执行的替代方法是微线程模型。该模型在编译时对顺序代码进行分段,并按顺序执行分段,同时在分段内保持顺序执行。片段执行的唯一约束是它们之间的依赖关系,可以使用同步寄存器以分布式和可伸缩的方式对其进行管理。代码片段称为微线程,它们捕获ILP和循环并发。片段可以在单个处理器上交错以容忍操作数中的延迟,也可以分配给许多处理器以实现加速。此模型的实现是完全可伸缩的。它支持分布式指令发布和完全可扩展的寄存器文件,该文件在单个芯片上的多个处理器之间实现了通信和同步的分布式共享寄存器模型。本文介绍了该模型,将其与当前方法进行了比较,并对一些实施问题进行了分析。它还显示了来自同一二进制代码的结果,显示了具有可扩展性能,并且问题宽度超过几个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号