Chainsaw: Von-neumann accelerators to leverage fused instruction chains

机译：链锯：von-neumann加速器利用融合的指示链

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A central tenet behind accelerators is to partition a program execution into regions with different behavior (e.g., SIMD, Irregular, Compute-Intensive) and then use behavior-specialized architectures [1] for each region. It is unclear whether the gains in efficiency arise from recognizing that a simpler microarchitecture is sufficient for the acceleratable code region or the actual microarchitecture, or a combination of both. Many proposals [2], [3] seem to choose dataflow-based accelerators which encounters challenges with fabric utilization and static power when the available instruction parallelism is below the peak operation parallelism available [4]. In this paper, we develop, Chainsaw, a Von-Neumann based accelerator and demonstrate that many of the fundamental overheads (e.g., fetch-decode) can be amortized by adopting the appropriate instruction abstraction. The key insight is the notion of chains, which are compiler fused sequences of instructions. chains adapt to different acceleration behaviors by varying the length of the chains and the types of instructions that are fused into a chain. Chains convey the producer-consumer locality between dependent instructions, which the Chainsaw architecture then captures by temporally scheduling such operations on the same execution unit and uses pipeline registers to forward the values between dependent operations. Chainsaw is a generic multi-lane architecture (4-stage pipeline per lane) and does not require any specialized compound function units; it can be reloaded enabling it to accelerate multiple program paths. We have developed a complete LLVM-based compiler prototype and simulation infrastructure and demonstrated that a 8-lane Chainsaw is within 73% of the performance of an ideal dataflow architecture, while reducing the energy consumption by 45% compared to a 4-way OOO processor.

机译：加速器背后的中央宗旨是将程序执行分为具有不同行为的区域（例如，SIMD，不规则，计算密集型），然后为每个区域使用特殊的架构[1]。目前尚不清楚效率的增益是否识别出更简单的微体系结构足以用于可加速的代码区域或实际微体系结构，或两者的组合。许多提案[2]，[3]似乎选择基于数据流的加速器，当可用指令并行性低于可用的峰值操作并行性时，当可用的指令并行性低于可用的峰值操作并行性时，遇到基于结构利用和静态功率的挑战。在本文中，我们开发，电锯基于von-neumann的加速器，并证明了许多基本开销（例如，获取解码）可以通过采用适当的指令抽象来摊销。关键洞察力是链条的概念，这是编译器融合的指令序列。通过改变链条的长度以及融合到链条的指令类型来适应不同的加速行为。链条在依赖指令之间传达生产者 - 消费者局部，然后通过在相同的执行单元上临时调度这些操作来捕获电链架，并使用流水线寄存器来转发依赖操作之间的值。电锯是通用多车道架构（每车道4级管道），不需要任何专门的复合功能单元;可以重新加载它，使其能够加速多个程序路径。我们开发了一个完整的基于LLVM的编译器原型和仿真基础设施，并证明了8通道的电锯在理想的数据流量架构的性能的73％范围内，同时与4路ooo处理器相比将能耗降低45％。

著录项

来源
《International Symposium on Microarchitecture》|2016年|766 p. :|共14页
会议地点
作者
Amirali Sharifian; Snehasish Kumar; Apala Guha; Arrvindh Shriraman;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP302-532;
关键词
Registers; Hardware; Parallel processing; Out of order; Pipelines; Computer architecture; Fabrics;

机译：寄存器;硬件;并行处理;耗尽秩序;管道;计算机架构;面料;
入库时间 2022-08-21 04:38:57

相似文献

外文文献
中文文献
专利

1. Comparisons between battery chainsaws and internal combustion engine chainsaws: performance and safety [J] . A. Colantoni, F. Mazzocchi, F. Cossio, Contemporary Engineering Sciences . 2016,第25a28期

机译：电池链锯和内燃机链锯的比较：性能和安全性
2. Chainsaw Review: Simon Bowes presents his annual chainsaw roundup [J] . Forestry journal. . 2010,第8期

机译：电锯评论：Simon Bowes展示了他的年度电锯综述
3. Chainsaw review: Simon Bowes takes his annual look at the chainsaw market [J] . Forestry journal. . 2009,第9期

机译：电锯评论：西蒙·鲍斯（Simon Bowes）每年考察电锯市场
4. Chainsaw: Von-neumann accelerators to leverage fused instruction chains [C] . Amirali Sharifian, Snehasish Kumar, Apala Guha, Annual IEEE/ACM International Symposium on Microarchitecture . 2016

机译：电锯：冯·诺依曼加速器利用融合的指令链
5. Fused Convolutional Neural Network Accelerators. [D] . Alwani, Manoj. 2015

机译：融合卷积神经网络加速器。
6. Hand-Wrist Disorders in Chainsaw Operators: A Follow-Up Study in a Group of Italian Loggers [O] . Federica Masci, Giovanna Spatari, Concetto Mario Giorgianni, 2021

机译：Chainsaw运算符中的手腕障碍：一组意大利记录器中的后续研究
7. Specialized Macro-Instructions for Von-Neumann Accelerators [O] . Sharifian Amirali 2016

机译：冯·诺依曼加速器专用宏指令

Chainsaw: Von-neumann accelerators to leverage fused instruction chains

摘要

著录项

相似文献

相关主题

期刊订阅