首页> 外文会议>International Symposium on Microarchitecture >Chainsaw: Von-neumann accelerators to leverage fused instruction chains
【24h】

Chainsaw: Von-neumann accelerators to leverage fused instruction chains

机译:链锯:von-neumann加速器利用融合的指示链

获取原文

摘要

A central tenet behind accelerators is to partition a program execution into regions with different behavior (e.g., SIMD, Irregular, Compute-Intensive) and then use behavior-specialized architectures [1] for each region. It is unclear whether the gains in efficiency arise from recognizing that a simpler microarchitecture is sufficient for the acceleratable code region or the actual microarchitecture, or a combination of both. Many proposals [2], [3] seem to choose dataflow-based accelerators which encounters challenges with fabric utilization and static power when the available instruction parallelism is below the peak operation parallelism available [4]. In this paper, we develop, Chainsaw, a Von-Neumann based accelerator and demonstrate that many of the fundamental overheads (e.g., fetch-decode) can be amortized by adopting the appropriate instruction abstraction. The key insight is the notion of chains, which are compiler fused sequences of instructions. chains adapt to different acceleration behaviors by varying the length of the chains and the types of instructions that are fused into a chain. Chains convey the producer-consumer locality between dependent instructions, which the Chainsaw architecture then captures by temporally scheduling such operations on the same execution unit and uses pipeline registers to forward the values between dependent operations. Chainsaw is a generic multi-lane architecture (4-stage pipeline per lane) and does not require any specialized compound function units; it can be reloaded enabling it to accelerate multiple program paths. We have developed a complete LLVM-based compiler prototype and simulation infrastructure and demonstrated that a 8-lane Chainsaw is within 73% of the performance of an ideal dataflow architecture, while reducing the energy consumption by 45% compared to a 4-way OOO processor.
机译:加速器背后的中央宗旨是将程序执行分为具有不同行为的区域(例如,SIMD,不规则,计算密集型),然后为每个区域使用特殊的架构[1]。目前尚不清楚效率的增益是否识别出更简单的微体系结构足以用于可加速的代码区域或实际微体系结构,或两者的组合。许多提案[2],[3]似乎选择基于数据流的加速器,当可用指令并行性低于可用的峰值操作并行性时,当可用的指令并行性低于可用的峰值操作并行性时,遇到基于结构利用和静态功率的挑战。在本文中,我们开发,电锯基于von-neumann的加速器,并证明了许多基本开销(例如,获取解码)可以通过采用适当的指令抽象来摊销。关键洞察力是链条的概念,这是编译器融合的指令序列。通过改变链条的长度以及融合到链条的指令类型来适应不同的加速行为。链条在依赖指令之间传达生产者 - 消费者局部,然后通过在相同的执行单元上临时调度这些操作来捕获电链架,并使用流水线寄存器来转发依赖操作之间的值。电锯是通用多车道架构(每车道4级管道),不需要任何专门的复合功能单元;可以重新加载它,使其能够加速多个程序路径。我们开发了一个完整的基于LLVM的编译器原型和仿真基础设施,并证明了8通道的电锯在理想的数据流量架构的性能的73%范围内,同时与4路ooo处理器相比将能耗降低45% 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号