...
首页> 外文期刊>IEEE Transactions on Computers >Exploiting Operand Availability for Efficient Simultaneous Multithreading
【24h】

Exploiting Operand Availability for Efficient Simultaneous Multithreading

机译:利用操作数可用性实现高效的同时多线程

获取原文
获取原文并翻译 | 示例
           

摘要

We propose several schemes to improve the scalability, reduce the complexity and delays, and increase the throughput of dynamic scheduling in SMT processors. Our first design is an adaptation of the proposed instruction packing to SMT. Instruction packing opportunistically packs two instructions (possibly from different threads), each with at most one nonready source operand at the time of dispatch, into the same issue queue entry. Our second design, termed 2OP_BLOCK, takes these ideas one step further and completely avoids the dispatching of the instructions with two nonready source operands. This technique has several advantages. First, it reduces the scheduling complexity (and the associated delays) as the logic needed to support the instructions with two nonready source operands is eliminated. More surprisingly, 2OP_BLOCK simultaneously improves the performance as the same issue queue entry may be reallocated multiple times to the instructions with at most one nonready source (which usually spends fewer cycles in the queue) as opposed to hogging the entry with an instruction which enters the queue with two nonready sources. For, schedulers with the capacity to hold 64 instructions on a 4-way SMT, the 2OP_BLOCK design outperforms the traditional queue by 14 percent, on average, and at the same time results in a 10 percent reduction in the overall scheduling delay. We also present mechanisms to support speculative scheduling with 2OP_BLOCK and introduce the hybrid scheme that dynamically switches between 2OP_BLOCK and instruction packing modes depending on the workload characteristics, to achieve further performance gains
机译:我们提出了几种方案来改善可扩展性,减少复杂性和延迟以及增加SMT处理器中动态调度的吞吐量。我们的第一个设计是将建议的指令包改编为SMT。指令打包将两条指令(可能来自不同的线程)打包到同一发布队列条目中,每条指令在分派时最多具有一个未就绪的源操作数。我们的第二种设计称为2OP_BLOCK,将这些想法又向前推进了一步,完全避免了使用两个未就绪的源操作数来调度指令。该技术具有几个优点。首先,由于消除了支持带有两个未就绪源操作数的指令所需的逻辑,因此降低了调度复杂度(以及相关的延迟)。更令人惊讶的是,2OP_BLOCK同时提高了性能,因为同一问题队列条目可能被最多多次分配给最多具有一个未就绪源的指令(通常在队列中花费较少的周期),而不是将条目与进入与两个未就绪源一起排队。对于能够在4路SMT上保留64条指令的调度器,平均而言,2OP_BLOCK设计的性能比传统队列高出14%,同时使总体调度延迟减少了10%。我们还介绍了支持2OP_BLOCK进行推测性调度的机制,并介绍了根据工作负载特征在2OP_BLOCK和指令打包模式之间动态切换的混合方案,以实现进一步的性能提升

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号