首页> 外文会议>ACM/IEEE Design Automation Conference >L-MPC: A LUT based MuIti-LeveI Prediction-Correction Architecture for Accelerating Binary-Weight Hourglass Network
【24h】

L-MPC: A LUT based MuIti-LeveI Prediction-Correction Architecture for Accelerating Binary-Weight Hourglass Network

机译:L-MPC:一种基于LUT的多级预测校正架构,用于加速二进制重量沙漏网络

获取原文

摘要

A binary-weight hourglass network (B-HG) accelerator for landmark detection, built on the proposed look-up-table (LUT) based multi-level prediction-correction approach, is enabled for high-speed and energy-efficient processing on IoT edge devices. First, LUT with a unified mode is adopted to support convolutional neural network with fully variable weight bit precision to minimize operations of B-HG, which achieves $1.33imes-1.50imes$ speedup on multi-bit weight CNN relative to the similar solution. Second, multi-level prediction-correction model is proposed to achieve computational-efficient convolution with adaptive precision. The operations saved can be increase by about 30% than the two-stage model. Besides, nearly 77.4% of the operations in B-HG can be saved by using the combination of these two methods, yielding a 2.3× inference speedup. Third, block computing based pipeline is designed to improve the residual block deficiency in B-HG. It can not only reduce about 66.2% off-chip memory access than the baseline, but also save 60% and 31% on-chip memory space and access compared to the similar fused-layer accelerator. The proposed B-HG accelerator achieves 450 fps at 500MHz based on the simulation in TSMC 28 nm process. Meanwhile, the power efficiency is up to 8.5 TOPS/W, which is two orders of magnitude higher than the dedicated face landmark detection accelerator.
机译:用于地标检测的二进制重量沙漏网络(B-HG)加速器,内置于基于查找基于表(LUT)的多级预测校正方法,可用于IOT的高速和节能处理边缘设备。首先,采用统一模式的LUT支持卷积神经网络,具有完全可变的权重位精度,以最大限度地减少B-HG的操作,该操作在相对于类似解决方案的多比特权重CNN上实现1.33倍-1.50倍。 。其次,提出了多级预测校正模型,以实现具有自适应精度的计算有效卷积。节省的操作可以比两级模型增加约30%。此外,通过使用这两种方法的组合,可以节省近77.4%的B-Hg操作,产生2.3×推理加速。第三,基于块计算的管道旨在改善B-HG的残余块缺陷。它不仅可以减少约66.2%的芯片内存访问,而不是基线,而且还节省了60%和31%的片上存储空间和与类似的融合层加速器相比的访问。所提出的B-HG Accelerator基于TSMC 28 NM工艺中的仿真,实现了500MHz的450fps。同时,功率效率高达8.5顶/倍,这是比专用面部地标检测加速器高的两个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号