首页> 外文会议>ACM/IEEE Annual International Symposium on Computer Architecture >Slipstream Processors Revisited: Exploiting Branch Sets
【24h】

Slipstream Processors Revisited: Exploiting Branch Sets

机译:再访滑流处理器:利用分支集

获取原文

摘要

Delinquent branches and loads remain key performance limiters in some applications. One approach to mitigate them is pre-execution. Broadly, there are two classes of pre-execution: one class repeatedly forks small helper threads, each targeting an individual dynamic instance of a delinquent branch or load; the other class begins with two redundant threads in a leader-follower arrangement, and speculatively reduces the leading thread. The objective of this paper is to design a new pre-execution microarchitecture that meets four criteria: (i) retains the simpler coordination of a leader-follower microarchitecture, (ii) is fully automated with just hardware, (iii) targets both branches and loads, (iv) and is effective. We review prior preexecution proposals and show that none of them meet all four criteria. We develop Slipstream 2.0 to meet all four criteria. The key innovation in the space of leader-follower architectures is to remove the forward control-flow slices of delinquent branches and loads, from the leading thread. This innovation overcomes key limitations in the only other hardware-only leader-follower prior works: Slipstream and Dual Core Execution (DCE). Slipstream removes backward slices of confident branches to pre-execute unconfident branches, which is ineffective in phases dominated by unconfident branches when branch pre-execution is most needed. DCE is very effective at tolerating cache-missed loads, unless their dependent branches are mispredicted. Removing forward control-flow slices of delinquent branches and delinquent loads enables two firsts, respectively: (1) leader-follower-style branch pre-execution without relying on confident instruction removal, and (2) tolerance of cache-missed loads that feed mispredicted branches. For SPEC 2006/2017 SimPoints wherein Slipstream 2.0 is auto-enabled, it achieves geomean speedups of 67%, 60%, and 12%, over baseline (one core), Slipstream, and DCE.
机译:在某些应用程序中,拖欠的分支和负载仍然是关键的性能限制因素。减轻它们的一种方法是预执行。广义上讲,有两类预执行:一类重复分叉小的帮助程序线程,每一个都针对不良分支或负载的单个动态实例。另一类从头到尾的布置中的两个冗余线程开始,并推测性地减少了前导线程。本文的目的是设计一种符合以下四个标准的新的预执行微体系结构:(i)保留领导者跟随者微体系结构的更简单协调;(ii)仅使用硬件实现完全自动化;(iii)针对分支机构和负载,(iv)并有效。我们审查了先前的执行前建议,并显示它们均不符合所有四个标准。我们开发Slipstream 2.0以满足所有四个标准。领导者跟随架构空间中的关键创新是从引导线程中删除拖欠分支和负载的前向控制流切片。这项创新克服了仅有的其他仅硬件领先的跟随者工作中的关键限制:滑流和双核执行(DCE)。 Slipstream移除有向分支的后向切片以预先执行无把握分支,这在最需要分支预先执行的情况下在以无把握分支为主的阶段中无效。 DCE在容忍缓存丢失的负载方面非常有效,除非错误地预测了它们的相关分支。删除滞后分支和滞后负载的前向控制流切片可实现两个第一,分别为:(1)领导者跟随者式分支预执行,而不依赖于有把握的指令移除;以及(2)缓存错误加载的错误负载的容忍度分支机构。对于自动启用Slipstream 2.0的SPEC 2006/2017 SimPoints,与基准(一个核心),Slipstream和DCE相比,它的几何平均速度提高了67%,60%和12%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号