【24h】

Fire-and-Forget

机译:一劳永逸

获取原文

摘要

Modern processors use CAM-based load and store queues (LQ/SQ) to support out-of-order memory scheduling and store-to-load forwarding. However, the LQ and SQ scale poorly for the sizes required for large-window, high- ILP processors. Past research has proposed ways to make the SQ more scalable by reorganizing the CAMs or using non-associative structures. In particular, the Store Queue Index Prediction (SQIP) approach allows for load instructions to predict the exact SQ index of a sourcing store and access the SQ in a much simpler and more scalable RAMbased fashion. The reason why SQIP works is that loads that receive data directly from stores will usually receive the data from the same store each time. In our work, we take a slightly different view on the underlying observation used by SQIP: a store that forwards data to a load usually forwards to the same load each time. This subtle change in perspective leads to our "Fire-and- Forget" (FnF) scheme for load/store scheduling and forwarding that results in the complete elimination of the store queue. The idea is that stores issue out of the reservation stations like regular instructions, and any store that forwards data to a load will use a predicted LQ index to directly write the value to the LQ entry without any associative logic. Any mispredictions/misforwardings are detected by a low-overhead pre-commit re-execution mechanism. Our original goal for FnF was to design a more scalable memory scheduling microarchitecture than the previously proposed approaches without degrading performance. The relative infrequency of store-to-load forwarding, accurate LQ index prediction, and speculative cloaking actually combine to enable FnF to slightly out-perform the competition. Specifically, our simulation results show that our SQless Fire-and-Forget provides a 3.3% speedup over a processor using a conventional fully-associative SQ.
机译:现代处理器使用基于CAM的加载和存储队列(LQ / SQ)来支持无序内存调度和存储到加载转发。但是,对于大窗口,高ILP处理器所需的大小,LQ和SQ的伸缩性很差。过去的研究提出了通过重组CAM或使用非关联结构来使SQ更具可伸缩性的方法。特别是,存储队列索引预测(SQIP)方法允许加载指令来预测采购商店的确切SQ索引,并以更加简单和可扩展的基于RAM的方式访问SQ。 SQIP起作用的原因是,通常直接从商店接收数据的负载通常每次都会从同一商店接收数据。在我们的工作中,我们对SQIP所使用的基础观察有不同的看法:将数据转发到负载的存储通常每次都转发到相同的负载。这种观点上的细微变化导致我们的“即弃即用”(FnF)方案用于加载/存储调度和转发,从而彻底消除了存储队列。想法是,存储区像常规指令一样从保留站发出,并且任何将数据转发到负载的存储区都将使用预测的LQ索引将值直接写入LQ条目,而无需任何关联逻辑。低开销的预提交重新执行机制可以检测到任何错误预测/错误转发。 FnF的最初目标是设计一种比以前提出的方法更具可扩展性的内存调度微体系结构,而不会降低性能。实际上,存储到负载转发的相对频率不高,准确的LQ指数预测和投机性伪装实际上使FnF略胜于竞争对手。具体而言,我们的仿真结果表明,与使用常规的完全关联SQ的处理器相比,我们的SQless即弃式提供了3.3%的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号