【24h】

Checkpointed early load retirement

机译:检查点早期负荷退休

获取原文
获取外文期刊封面目录资料

摘要

Long-latency loads are critical in today's processors due to the ever-increasing speed gap with memory. Not only do these loads block the execution of dependent instructions, they also prevent other instructions from moving through the in-order reorder buffer (ROB) and retire. As a result, the processor quickly fills up with uncommitted instructions, and computation ultimately stalls. To attack this problem, we propose checkpointed early load retirement, a mechanism that combines register checkpointing and back-end .e., at retirement - load-value prediction. When a long-latency load hits the ROB head unresolved, the processor enters clear mode by (1) taking a checkpoint of the architectural registers, (2) supplying a load-value prediction to consumers, and (3) early-retiring the long-latency load. This unclogs the ROB, thereby "clearing the way" for subsequent instructions to retire, and also allowing instructions dependent on the long-latency load to execute sooner. When the actual value returns from memory, it is compared against the prediction. A misprediction causes the processor to roll back to the checkpoint, discarding all subsequent computation. The benefits of executing in clear mode come from providing early forward progress on correct predictions, and from warming up caches and other structures on wrong predictions. Our evaluation shows that a clear implementation with support for four checkpoints yields an average speedup of 1.12 for both eleven integer and eight floating-point applications (1.27 and 1.19 for five integer and five floating point memory-bound applications, respectively), relative to a contemporary out-of-order processor with an aggressive hardware prefetcher.
机译:由于内存不断增长的速度间隙,长期延迟负载在当今的处理器中至关重要。这些加载不仅可以阻止依赖指令的执行,它们还可以防止其他指令通过有序重新排序缓冲区(ROB)并退休。因此,处理器迅速填充未提交的指令,并且计算最终停顿。为了攻击这个问题,我们建议检查点的早期负荷退役,这是一个组合寄存器检查点和后端的机制。,在退休 - 负载值预测。当长延迟负载击中ROB头部未解析时,处理器通过(1)通过架构寄存器的检查点进入(1),(2)向消费者提供负载值预测,(3)早期退休-Latency load。这揭示了ROB,从而“清除方式”以用于退休的后续指令,并且还允许依赖于长期负载的指令更快地执行。当实际值从内存返回时,将其与预测进行比较。错误规范使处理器回滚到检查点,丢弃所有后续计算。在明确模式下执行的好处来自于正确的预测,以及在错误的预测上加热缓存和其他结构。我们的评估表明,对于四个检查点的支持明确的实现,对于11个整数和八个浮点应用(1.27和1.19,分别为五个整数和五个浮点内存绑定应用程序,而且,相对于a现代风格超级处理器,具有侵略性的硬件预取器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号