首页> 外文会议>IEEE International Solid- State Circuits Conference >31.3 A Compute-Adaptive Elastic Clock-Chain Technique with Dynamic Timing Enhancement for 2D PE-Array-Based Accelerators
【24h】

31.3 A Compute-Adaptive Elastic Clock-Chain Technique with Dynamic Timing Enhancement for 2D PE-Array-Based Accelerators

机译:31.3一种基于2D PE阵列的加速器的具有动态时序增强功能的计算自适应弹性时钟链技术

获取原文

摘要

Dynamic timing error detection and correction techniques, e.g. razor flops, have been previously applied to microprocessors to exploit the dynamic timing margin within pipelines [1]. Adaptive clock techniques have also been adopted to enhance microprocessor performance, such as schemes to reduce the timing guardband for on-chip supply droops [2]–[3] or to exploit instruction-level dynamic timing slack [4]. Recently, 2D PE array-based accelerators have been developed for machine learning (ML) applications. Many efforts have been dedicated to improve the energy efficiency of such accelerators, e.g. DVFS management for the DNN under various bit precision [5]. A razor technique was also applied to a 1D 8-MAC pipelined accelerator to explore timing error tolerance [6]. Despite the above efforts, a fine-grained dynamic-timing-based technique has not been implemented within a large 2D array based ML accelerator. One main challenge comes from the large amount of compute-timing bottlenecks within the 2D array, which will continuously trigger critical path adaptation or pipeline stalls, nullifying the benefits of previous dynamic-timing techniques [4], [6]. To deal with the difficulty, we propose the following solutions. A local in-situ compute-detection scheme was applied to anticipate upcoming timing variations within the PE unit and guide both instruction-based and operand-based adaptive clock management. To loosen the stringent timing requirements in a large 2D PE array, an “elastic” clock-chain technique using multiple loosely synchronized clock domains was developed enabling dynamic-timing enhancement through clusters of PE units.
机译:动态定时错误检测和纠正技术,例如剃须刀触发器曾被应用于微处理器,以利用流水线内的动态时序余量[1]。自适应时钟技术也已被采用来增强微处理器性能,例如减少片上电源下降[2]-[3]的时序保护带或利用指令级动态时序松弛[4]的方案。最近,已经开发了基于2D PE阵列的加速器,用于机器学习(ML)应用程序。为了提高这种加速器的能量效率,已经付出了许多努力,例如,在美国,在各种比特精度下,DNN的DVFS管理[5]。剃刀技术还应用于一维8-MAC流水线加速器,以探索时序误差容限[6]。尽管做出了上述努力,但尚未在基于大型2D数组的ML加速器中实现基于细粒度动态定时的技术。一个主要挑战来自2D阵列中大量的计算时序瓶颈,这些瓶颈将持续触发关键路径自适应或流水线停顿,从而使先前的动态时序技术的优势无效[4],[6]。为了解决这个困难,我们提出以下解决方案。应用本地原位计算检测方案来预测PE单元内即将出现的时序变化,并指导基于指令和基于操作数的自适应时钟管理。为了放宽大型2D PE阵列中严格的时序要求,开发了使用多个松散同步时钟域的“弹性”时钟链技术,从而可以通过PE单元集群来增强动态时序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号