Slipstream Processors Revisited: Exploiting Branch Sets

机译：再访滑流处理器：利用分支集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Delinquent branches and loads remain key performance limiters in some applications. One approach to mitigate them is pre-execution. Broadly, there are two classes of pre-execution: one class repeatedly forks small helper threads, each targeting an individual dynamic instance of a delinquent branch or load; the other class begins with two redundant threads in a leader-follower arrangement, and speculatively reduces the leading thread. The objective of this paper is to design a new pre-execution microarchitecture that meets four criteria: (i) retains the simpler coordination of a leader-follower microarchitecture, (ii) is fully automated with just hardware, (iii) targets both branches and loads, (iv) and is effective. We review prior preexecution proposals and show that none of them meet all four criteria. We develop Slipstream 2.0 to meet all four criteria. The key innovation in the space of leader-follower architectures is to remove the forward control-flow slices of delinquent branches and loads, from the leading thread. This innovation overcomes key limitations in the only other hardware-only leader-follower prior works: Slipstream and Dual Core Execution (DCE). Slipstream removes backward slices of confident branches to pre-execute unconfident branches, which is ineffective in phases dominated by unconfident branches when branch pre-execution is most needed. DCE is very effective at tolerating cache-missed loads, unless their dependent branches are mispredicted. Removing forward control-flow slices of delinquent branches and delinquent loads enables two firsts, respectively: (1) leader-follower-style branch pre-execution without relying on confident instruction removal, and (2) tolerance of cache-missed loads that feed mispredicted branches. For SPEC 2006/2017 SimPoints wherein Slipstream 2.0 is auto-enabled, it achieves geomean speedups of 67%, 60%, and 12%, over baseline (one core), Slipstream, and DCE.

机译：在某些应用程序中，拖欠的分支和负载仍然是关键的性能限制因素。减轻它们的一种方法是预执行。广义上讲，有两类预执行：一类重复分叉小的帮助程序线程，每一个都针对不良分支或负载的单个动态实例。另一类从头到尾的布置中的两个冗余线程开始，并推测性地减少了前导线程。本文的目的是设计一种符合以下四个标准的新的预执行微体系结构：（i）保留领导者跟随者微体系结构的更简单协调;（ii）仅使用硬件实现完全自动化;（iii）针对分支机构和负载，（iv）并有效。我们审查了先前的执行前建议，并显示它们均不符合所有四个标准。我们开发Slipstream 2.0以满足所有四个标准。领导者跟随架构空间中的关键创新是从引导线程中删除拖欠分支和负载的前向控制流切片。这项创新克服了仅有的其他仅硬件领先的跟随者工作中的关键限制：滑流和双核执行（DCE）。 Slipstream移除有向分支的后向切片以预先执行无把握分支，这在最需要分支预先执行的情况下在以无把握分支为主的阶段中无效。 DCE在容忍缓存丢失的负载方面非常有效，除非错误地预测了它们的相关分支。删除滞后分支和滞后负载的前向控制流切片可实现两个第一，分别为：（1）领导者跟随者式分支预执行，而不依赖于有把握的指令移除;以及（2）缓存错误加载的错误负载的容忍度分支机构。对于自动启用Slipstream 2.0的SPEC 2006/2017 SimPoints，与基准（一个核心），Slipstream和DCE相比，它的几何平均速度提高了67％，60％和12％。

著录项

来源
《ACM/IEEE Annual International Symposium on Computer Architecture》|2020年|105-117|共13页
会议地点
作者
Vinesh Srinivasan; Rangeen Basu Roy Chowdhury; Eric Rotenberg;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
branch prediction; prefetching; hardto-predict branch; delinquent load; pre-execution; helper threads; control independence;

机译：分支预测;预取;难以预测的分支;拖欠贷款;预执行;帮助线程;控制独立性;
入库时间 2022-08-26 13:55:25

相似文献

外文文献
中文文献
专利

1. AMBIGUITY AND LEARNING IN THE INNOVATION PROCESS: MANAGING EXPLOITATION-EXPLOITATION BY BALANCING CREATIVITY AND CONSTRAINT REVISITED [J] . ALF STEINAR SAETRE, ERIC BRUN International Journal of Innovation and Technology Management . 2013,第4期

机译：创新过程中的模糊性和学习：通过平衡创造力和约束来管理开发-开发
2. Non-exhaustive family based dispatching heuristics - exploiting variances of processing and set-up times [J] . Durk-Jouke van der Zee International Journal of Production Research . 2010,第13a14期

机译：基于非穷尽家庭的调度启发法-利用处理和设置时间的差异
3. REFLECTIONS ON THE 2013 DECADE AWARD-"EXPLOITATION, EXPLORATION, AND PROCESS MANAGEMENT: THE PRODUCTIVITY DILEMMA REVISITED" TEN YEARS LATER [J] . Benner Mary J., Tushman Michael L. The Academy of Management Review . 2015,第4期

机译：关于2013 DECADE奖的反思-“开发，探索和过程管理：十年后重新审视生产力困境”
4. Area-Aware Optimizations for Resource Constrained Branch Predictors Exploited in Embedded Processors [C] . Babak Salamat, Amirali Baniasadi, Kaveh Jokar Deris International Conference on Embedded Computer Systems . 2006

机译：嵌入式处理器中利用的资源受限分支预测器的区域感知优化
5. Slipstream Processors Revisited: Exploiting Branch Sets [D] . ?Srinivasan, Vinesh 2019

机译：重新审视滑翔流处理器：利用分支机集
6. The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process [O] . Verena Heinrich, Jens Stange, Thorsten Dickhaus, 2012

机译：作为随机分支过程的结果可以准确地描述下一代测序数据集中的等位基因分布
7. Slipstream Processors Revisited: Exploiting Branch Sets [O] . Vinesh Srinivasan, Rangeen Basu Roy Chowdhury, Eric Rotenberg 2020

机译：重新审视滑翔流处理器：利用分支机集

Slipstream Processors Revisited: Exploiting Branch Sets

摘要

著录项

相似文献

相关主题

期刊订阅