BOLT: Energy-Efficient Out-of-Order Latency-Tolerant Execution

机译：螺栓：节能超出潜伏的宽容执行

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

LT (latency tolerant) execution is an attractive candidate technique for future out-of-order cores. LT defers the forward slices of LLC (last-level cache) misses to a slice buffer and re-executes them when the misses return. An LT core increases ILP without physically scaling the issue queue and register file and increases MLP without additional software threads that can reduce cache performance. Unfortunately, proposed LT designs are not energy efficient. They require too many additional structures and they defer and re-execute too many instructions to justify their performance gains. In this paper, we address these inefficiencies. We introduce a microarchitecture called BOLT (Better Out-of-Order Latency-Tolerance) that implements LT as an alternative use of SMT (Simultaneous Multi-Threading). We also present a new slice buffer organization and traversal scheme that increases performance and reduces overhead by pruning instances of useless and redundant LT. Collectively, these modifications turn out-of-order LT into a technique that improves performance in an energy-efficient way.

机译：LT（潜伏）执行是一个有吸引力的候选技术，用于未来秩序核心。 LT缺少的LLC（最后级别缓存）的前向切片未命中到切片缓冲区，并在未命中返回时重新执行它们。 LT核心增加ILP而不会物理地缩放问题队列和注册文件，并增加MLP而无需额外的软件线程，可以降低缓存性能。不幸的是，提出的LT设计并不节能。它们需要太多的额外结构，他们推迟并重新执行太多指示，以证明他们的性能收益。在本文中，我们解决了这些效率低下。我们介绍一个名为BOLT（更好的超出延迟的潜伏潜伏）的微体建筑，其实现了SMT（同时多线程）的替代使用。我们还提出了一种新的切片缓冲组织和遍历方案，可以提高性能并通过冗余的冗余实例降低开销。集体，这些修改将LT转换为以节能方式提高性能的技术。

著录项

来源
《IEEE International Symposium on High Performance Computer Architecture》|2010年||共12页
会议地点
作者
Andrew Hilton; Amir Roth;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词

相似文献

外文文献
中文文献
专利

1. CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution Near In-Order Energy with Near Out-of-Order Performance [J] . Mohammadi Milad, Aamodt Tor M., Dally William J. ACM Transactions on Architecture and Code Optimization . 2017,第4期

机译：CG-OOO：在近无序性能的近期能源附近的节能粗粒粗粒子执行
2. SPIDER-based out-of-order execution scheme for Ht-MPSOC [J] . R. Karthick, M. Sundararajan International journal of advanced intelligence paradigms . 2021,第1期

机译：HT-MPSOC的基于蜘蛛的超出订单执行方案
3. O⁴-DNN: A Hybrid DSP-LUT-Based Processing Unit With Operation Packing and Out-of-Order Execution for Efficient Realization of Convolutional Neural Networks on FPGA Devices [J] . Haghi Pouya, Kamal Mehdi, Afzali-Kusha Ali, IEEE transactions on circuits and systems . I , Regular papers . 2020,第9期

机译：O⁴-DNN：一种基于混合DSP-LUT的处理单元，具有操作包装和超出执行，以便在FPGA设备上有效地实现卷积神经网络
4. BOLT: Energy-efficient Out-of-Order Latency-Tolerant execution [C] . Hilton A., Roth A. 2010 IEEE 16th international symposium on high performance computer architecture . 2010

机译：螺栓：高能效的无序延迟等待时间执行
5. Bamboo: Automatic Translation of MPI Source into a Latency-Tolerant Form. [D] . Nguyen Thanh, Nhat Tan. 2014

机译：Bamboo：将MPI源自动转换为延迟容忍的形式。
6. Low-Element Image Restoration Based on an Out-of-Order Elimination Algorithm [O] . Yaqin Xie, Jiayin Yu, Xinwu Chen, 2019

机译：基于一阶消除算法的低元素图像恢复
7. BOLT: Energy-Efficient Out-of-Order Latency-Tolerant Execution [O] . Andrew Hilton, Amir Roth 2010

机译：螺栓：节能的无序延迟等待执行

BOLT: Energy-Efficient Out-of-Order Latency-Tolerant Execution

摘要

著录项

相似文献

相关主题

期刊订阅