首页> 外文会议>Annual IEEE/ACM International Symposium on Microarchitecture >Bungee jumps: Accelerating indirect branches through HW/SW co-design
【24h】

Bungee jumps: Accelerating indirect branches through HW/SW co-design

机译:蹦极跳:通过硬件/软件协同设计加速间接分支

获取原文

摘要

Indirect branches have historically been a challenge for microarchitectures and code generators alike. The recent steady increase in indirect branch predictability has translated into continual performance improvements especially for Out-of-Order processors which benefit more readily from improvements in branch prediction. In contrast, in-order processors which rely on code generators for performance are still challenged by indirect branches; they are a frequent source of issue stalls and the large number of indirect branch targets and unbiased nature of indirect branches complicate the use of traditional branch handling techniques like assert conversion and predication. To address these limitations, we propose an ISA enhancement with associated code transformation and hardware support that collectively enable the current trend of improved indirect branch predictability to be directly leveraged by code-generators for in-orders. By separating the prediction point of an indirect branch from its resolution point, we enable code generators to emit schedules which more readily match those found by the Out-of-Order. Our technique is particularly beneficial to those processors which leverage dynamic binary translation and optimization such as Transmeta's Efficeon and more recently Nvidia's Project Denver. On a set of indirect branch intensive benchmarks from SPEC 2006, 2000 and 95, we achieve a Geomean speedup on a 4-wide of 11%. We further demonstrate speedups of 23% and 14% speedup on PHP and Python benchmarks.
机译:过去,间接分支一直是微体系结构和代码生成器的挑战。间接分支可预测性的最新稳定增长已转化为持续的性能改进,特别是对于乱序处理器,这些处理器更容易从分支预测的改进中受益。相反,依赖代码生成器来实现性能的有序处理器仍然受到间接分支的挑战。它们是问题停顿的频繁来源,并且大量的间接分支目标和间接分支的无偏属性使传统的分支处理技术(如断言转换和谓词)的使用变得复杂。为了解决这些局限性,我们提出了一种具有相关代码转换和硬件支持的ISA增强功能,这些功能可以使代码生成器按顺序直接利用改进的间接分支可预测性的当前趋势。通过将间接分支的预测点与其解析点分开,我们使代码生成器能够发出更容易匹配无序调度的调度。对于那些利用动态二进制转换和优化的处理器(例如Transmeta的Efficeon和最近的Nvidia的Project Denver),我们的技术尤其有益。根据SPEC 2006、2000和95的一系列间接分支密集基准,我们实现了Geomean加速4倍(11%)。我们进一步展示了在PHP和Python基准测试中分别提高了23%和14%的速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号