O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference

Geng Tong; Li Ang; Wang Tianqi; Wu Chunshu; Li Yanfei; Shi Runbin; Wu Wei; Herbordt Martin

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference

【24h】

O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference

机译：O3BNN-R：用于高性能和正则化BNN推理的订单超出架构

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Binarized Neural Networks (BNN), which significantly reduce computational complexity and memory demand, have shown potential in cost- and power-restricted domains, such as IoT and smart edge-devices, where reaching certain accuracy bars is sufficient and real-time is highly desired. In this article, we demonstrate that the highly-condensed BNN model can be shrunk significantly by dynamically pruning irregular redundant edges. Based on two new observations on BNN-specific properties, an out-of-order (OoO) architecture, O3BNN-R, which can curtail edge evaluation in cases where the binary output of a neuron can be determined early at runtime during inference, is proposed. Similar to instruction level parallelism (ILP), fine-grained, irregular, and runtime pruning opportunities are traditionally presumed to be difficult to exploit. To further enhance the pruning opportunities, we conduct an algorithm/architecture co-design approach where we augment the loss function during the training stage with specialized regularization terms favoring edge pruning. We evaluate our design on an embedded FPGA using networks that include VGG-16, AlexNet for ImageNet, and a VGG-like network for Cifar-10. Results show that O3BNN-R without regularization can prune, on average, 30 percent of the operations, without any accuracy loss, bringing 2.2x inference-speedup, and on average 34x energy-efficiency improvement over state-of-the-art BNN implementations on FPGA/GPU/CPU. With regularization at training, the performance is further improved, on average, by 15 percent.

机译：二金属化神经网络（BNN）显着降低计算复杂性和内存需求，在成本和功率限制域中显示了潜力，例如IOT和智能边缘设备，其中达到某种精度棒是足够的并且实时高度期望。在本文中，我们证明，通过动态地修剪不规则的冗余边缘，可以显着地缩小高凝结的BNN模型。基于对BNN特定性质的两个新观察，可以在推理期间早期在运行时提前确定神经元的二进制输出的情况下可以减少秩序（OOO）架构O3BNN-R的or3bnn-r。建议的。类似于指令水平并行性（ILP），细粒度，不规则和运行时修剪机会传统上被认为是难以利用的。为了进一步提高修剪机会，我们开展了一种算法/架构共设计方法，我们在培训阶段增强了损失功能，其中包含有利于边缘修剪的专业正则化术语。我们使用包含VGG-16，AlexNet用于Imagenet的网络和用于CiFar-10的VGG样网络的网络评估我们的设计。结果表明，O3BNN-R没有正规化可以平均占30％的运营，而无需任何精度损失，带来2.2倍推理 - 加速，以及平均最先进的BNN实施方式的34倍的能效改进在FPGA / GPU / CPU上。在培训时进行正常化，平均进一步改善了性能15％。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2021年第1期|199-213|共15页
作者
Geng Tong; Li Ang; Wang Tianqi; Wu Chunshu; Li Yanfei; Shi Runbin; Wu Wei; Herbordt Martin;
展开▼
作者单位

Boston Univ Dept Elect & Comp Engn Boston MA 02215 USA;

Pacific Northwest Natl Lab PCSD Richland WA 99354 USA;

Boston Univ Dept Elect & Comp Engn Boston MA 02215 USA;

Boston Univ Dept Elect & Comp Engn Boston MA 02215 USA;

Coll Informat Sci & Elect Engn Hangzhou 310027 Zhejiang Peoples R China;

Univ Hong Kong Dept Elect & Elect Engn Hong Kong Peoples R China;

Los Alamos Natl Lab LANL Program Model Team Los Alamos NM 87545 USA;

Boston Univ Dept Elect & Comp Engn Boston MA 02215 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Neurons; Training; Out of order; Computer architecture; Convolution; Integrated circuits; Computational complexity; Machine learning; BNN; high-performance computing; neural network pruning; out-of-order architecture;

机译：神经元;训练;无序;计算机架构;卷积;集成电路;计算复杂性;机器学习;BNN;高性能计算;神经网络修剪;无序架构;

相似文献

外文文献
中文文献
专利

1. High-Performance Instruction Scheduling Circuits for Superscalar Out-of-Order Soft Processors [J] . Wong Henry, Betz Vaughn, Rose Jonathan ACM transactions on reconfigurable technology and systems . 2018,第1期

机译：超标量无序软处理器的高性能指令调度电路
2. Power-aware out-of-order issue logic in high-performance microprocessors [J] . Yehuda Sadeh Weinraub, Shlomo Weiss Microprocessors and microsystems . 2006,第7期

机译：高性能微处理器中的电源感知无序发布逻辑
3. MACSen: A Processing-In-Sensor Architecture Integrating MAC Operations Into Image Sensor for Ultra-Low-Power BNN-Based Intelligent Visual Perception [J] . Xu Han, Li Ziru, Lin Ningchao, Circuits and Systems II: Express Briefs, IEEE Transactions on . 2021,第2期

机译：MACSEN：将MAC操作集成到图像传感器中的传感器架构，以实现基于超低功耗的基于BNN的智能视觉感知
4. Out-of-Order Processing: A New Architecture for High-Performance Stream Systems [C] . Jin Li, Kristin Tufte, Vladislav Shkapenyuk, International conference on very large data bases;VLDB 2008 . 2008

机译：乱序处理：高性能流系统的新架构
5. Watt Community Rowing Center: High-Performance Athletes Yield High-Performance Architecture. [D] . LaChance, Allison Marie. 2014

机译：瓦特社区赛艇中心：高性能运动员产生高性能体系结构。
6. Regularized EM algorithm for sparse parameter estimation in nonlinear dynamic systems with application to gene regulatory network inference [O] . Bin Jia, Xiaodong Wang 2014

机译：非线性动态系统中稀疏参数估计的正则化EM算法在基因调控网络推断中的应用
7. Efficient Methods for Out-of-Order Load/Store Execution for High-Performance Soft Processors [O] . Henry Wong, Vaughn Betz, Jonathan Rose 2016

机译：高性能软处理器无序加载/存储执行的有效方法

O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference

摘要

著录项

相似文献

相关主题

期刊订阅