首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference
【24h】

O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference

机译:O3BNN-R:用于高性能和正则化BNN推理的订单超出架构

获取原文
获取原文并翻译 | 示例
           

摘要

Binarized Neural Networks (BNN), which significantly reduce computational complexity and memory demand, have shown potential in cost- and power-restricted domains, such as IoT and smart edge-devices, where reaching certain accuracy bars is sufficient and real-time is highly desired. In this article, we demonstrate that the highly-condensed BNN model can be shrunk significantly by dynamically pruning irregular redundant edges. Based on two new observations on BNN-specific properties, an out-of-order (OoO) architecture, O3BNN-R, which can curtail edge evaluation in cases where the binary output of a neuron can be determined early at runtime during inference, is proposed. Similar to instruction level parallelism (ILP), fine-grained, irregular, and runtime pruning opportunities are traditionally presumed to be difficult to exploit. To further enhance the pruning opportunities, we conduct an algorithm/architecture co-design approach where we augment the loss function during the training stage with specialized regularization terms favoring edge pruning. We evaluate our design on an embedded FPGA using networks that include VGG-16, AlexNet for ImageNet, and a VGG-like network for Cifar-10. Results show that O3BNN-R without regularization can prune, on average, 30 percent of the operations, without any accuracy loss, bringing 2.2x inference-speedup, and on average 34x energy-efficiency improvement over state-of-the-art BNN implementations on FPGA/GPU/CPU. With regularization at training, the performance is further improved, on average, by 15 percent.
机译:二金属化神经网络(BNN)显着降低计算复杂性和内存需求,在成本和功率限制域中显示了潜力,例如IOT和智能边缘设备,其中达到某种精度棒是足够的并且实时高度期望。在本文中,我们证明,通过动态地修剪不规则的冗余边缘,可以显着地缩小高凝结的BNN模型。基于对BNN特定性质的两个新观察,可以在推理期间早期在运行时提前确定神经元的二进制输出的情况下可以减少秩序(OOO)架构O3BNN-R的or3bnn-r。建议的。类似于指令水平并行性(ILP),细粒度,不规则和运行时修剪机会传统上被认为是难以利用的。为了进一步提高修剪机会,我们开展了一种算法/架构共设计方法,我们在培训阶段增强了损失功能,其中包含有利于边缘修剪的专业正则化术语。我们使用包含VGG-16,AlexNet用于Imagenet的网络和用于CiFar-10的VGG样网络的网络评估我们的设计。结果表明,O3BNN-R没有正规化可以平均占30%的运营,而无需任何精度损失,带来2.2倍推理 - 加速,以及平均最先进的BNN实施方式的34倍的能效改进在FPGA / GPU / CPU上。在培训时进行正常化,平均进一步改善了性能15%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号