首页> 外文会议>IEEE International Symposium on High Performance Computer Architecture >BOLT: Energy-Efficient Out-of-Order Latency-Tolerant Execution
【24h】

BOLT: Energy-Efficient Out-of-Order Latency-Tolerant Execution

机译:螺栓:节能超出潜伏的宽容执行

获取原文

摘要

LT (latency tolerant) execution is an attractive candidate technique for future out-of-order cores. LT defers the forward slices of LLC (last-level cache) misses to a slice buffer and re-executes them when the misses return. An LT core increases ILP without physically scaling the issue queue and register file and increases MLP without additional software threads that can reduce cache performance. Unfortunately, proposed LT designs are not energy efficient. They require too many additional structures and they defer and re-execute too many instructions to justify their performance gains. In this paper, we address these inefficiencies. We introduce a microarchitecture called BOLT (Better Out-of-Order Latency-Tolerance) that implements LT as an alternative use of SMT (Simultaneous Multi-Threading). We also present a new slice buffer organization and traversal scheme that increases performance and reduces overhead by pruning instances of useless and redundant LT. Collectively, these modifications turn out-of-order LT into a technique that improves performance in an energy-efficient way.
机译:LT(潜伏)执行是一个有吸引力的候选技术,用于未来秩序核心。 LT缺少的LLC(最后级别缓存)的前向切片未命中到切片缓冲区,并在未命中返回时重新执行它们。 LT核心增加ILP而不会物理地缩放问题队列和注册文件,并增加MLP而无需额外的软件线程,可以降低缓存性能。不幸的是,提出的LT设计并不节能。它们需要太多的额外结构,他们推迟并重新执行太多指示,以证明他们的性能收益。在本文中,我们解决了这些效率低下。我们介绍一个名为BOLT(更好的超出延迟的潜伏潜伏)的微体建筑,其实现了SMT(同时多线程)的替代使用。我们还提出了一种新的切片缓冲组织和遍历方案,可以提高性能并通过冗余的冗余实例降低开销。集体,这些修改将LT转换为以节能方式提高性能的技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号