首页> 外文会议>ACM/IEEE Design Automation Conference >L-MPC: A LUT based MuIti-LeveI Prediction-Correction Architecture for Accelerating Binary-Weight Hourglass Network

【24h】

L-MPC: A LUT based MuIti-LeveI Prediction-Correction Architecture for Accelerating Binary-Weight Hourglass Network

机译：L-MPC：一种基于LUT的多级预测校正架构，用于加速二进制重量沙漏网络

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A binary-weight hourglass network (B-HG) accelerator for landmark detection, built on the proposed look-up-table (LUT) based multi-level prediction-correction approach, is enabled for high-speed and energy-efficient processing on IoT edge devices. First, LUT with a unified mode is adopted to support convolutional neural network with fully variable weight bit precision to minimize operations of B-HG, which achieves $1.33imes-1.50imes$ speedup on multi-bit weight CNN relative to the similar solution. Second, multi-level prediction-correction model is proposed to achieve computational-efficient convolution with adaptive precision. The operations saved can be increase by about 30% than the two-stage model. Besides, nearly 77.4% of the operations in B-HG can be saved by using the combination of these two methods, yielding a 2.3× inference speedup. Third, block computing based pipeline is designed to improve the residual block deficiency in B-HG. It can not only reduce about 66.2% off-chip memory access than the baseline, but also save 60% and 31% on-chip memory space and access compared to the similar fused-layer accelerator. The proposed B-HG accelerator achieves 450 fps at 500MHz based on the simulation in TSMC 28 nm process. Meanwhile, the power efficiency is up to 8.5 TOPS/W, which is two orders of magnitude higher than the dedicated face landmark detection accelerator.

机译：用于地标检测的二进制重量沙漏网络（B-HG）加速器，内置于基于查找基于表（LUT）的多级预测校正方法，可用于IOT的高速和节能处理边缘设备。首先，采用统一模式的LUT支持卷积神经网络，具有完全可变的权重位精度，以最大限度地减少B-HG的操作，该操作在相对于类似解决方案的多比特权重CNN上实现1.33倍-1.50倍。。其次，提出了多级预测校正模型，以实现具有自适应精度的计算有效卷积。节省的操作可以比两级模型增加约30％。此外，通过使用这两种方法的组合，可以节省近77.4％的B-Hg操作，产生2.3×推理加速。第三，基于块计算的管道旨在改善B-HG的残余块缺陷。它不仅可以减少约66.2％的芯片内存访问，而不是基线，而且还节省了60％和31％的片上存储空间和与类似的融合层加速器相比的访问。所提出的B-HG Accelerator基于TSMC 28 NM工艺中的仿真，实现了500MHz的450fps。同时，功率效率高达8.5顶/倍，这是比专用面部地标检测加速器高的两个数量级。

著录项

来源
《ACM/IEEE Design Automation Conference 》|2019年|643 p. :|共6页
会议地点
作者
Hong Liu; Leibo Liu; Wenping Zhu; Qiang Li; Huiyu Mo; Shaojun Wei;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TH1;
关键词
Table lookup; Kernel; Convolution; Computational modeling; Energy efficiency; Image edge detection; Hardware;

机译：表查找;内核;卷积;计算建模;能效;图像边缘检测;硬件;

相似文献

外文文献
中文文献
专利

1. An On-Chip Binary-Weight Convolution CMOS Image Sensor for Neural Networks [J] . Kim Woo-Tae, Lee Hyunkeun, Kim Jung-Gyun, IEEE Transactions on Industrial Electronics . 2021 ,第8期

机译：用于神经网络的片上二进制重量卷积CMOS图像传感器
2. An Updated Survey of Efficient Hardware Architectures for Accelerating Deep Convolutional Neural Networks [J] . Maurizio Capra, Beatrice Bussolino, Alberto Marchisio, Future Internet . 2020 ,第7期

机译：关于加速深度卷积神经网络的高效硬件架构的更新调查
3. Trident7 technology acquisition accelerates Aurora Networks' cable architecture evolution strategy [J] . Paul Polishuk Fiber Optics & Communications: Monthly newsletter lovering domestic & international news on fiber optic communications and related fields . 2011 ,第9期

机译：收购Trident7技术加速了Aurora Networks电缆架构的发展战略
4. L-MPC: A LUT based MuIti-LeveI Prediction-Correction Architecture for Accelerating Binary-Weight Hourglass Network [C] . Hong Liu, Leibo Liu, Wenping Zhu, 2019 56th ACM/IEEE Design Automation Conference . 2019

机译：L-MPC：基于LUT的MuLti-LeveL预测-校正架构，用于加速二重沙漏网络
5. Low-latency Image Recognition with GPU-accelerated Convolutional Networks for Web-based Services. [D] . Huang, Fu Jie. 2014

机译：基于Web的服务的GPU加速的卷积网络的低延迟图像识别。
6. RAPA-ConvNets: Modified Convolutional Networks for Accelerated Training on Architectures With Analog Arrays [O] . Malte J. Rasch, Tayfun Gokmen, Mattia Rigotti, 2010

机译：RAPA-ConvNets：改进的卷积网络，可对带有模拟阵列的体系结构进行加速培训
7. Graph Convolutional Hourglass Networks for Skeleton-Based Action Recognition [O] . Yiran Zhu, Xing Xu, Yanli Ji, 2021

机译：图形卷积沙漏网络，用于基于骨架的动作识别

L-MPC: A LUT based MuIti-LeveI Prediction-Correction Architecture for Accelerating Binary-Weight Hourglass Network

摘要

著录项

相似文献

相关主题

期刊订阅