...
首页> 外文期刊>Philosophical transactions of the Royal Society. Mathematical, physical, and engineering sciences >PULP-NN: accelerating quantized neural networks on parallel ultra-low-power RISC-V processors
【24h】

PULP-NN: accelerating quantized neural networks on parallel ultra-low-power RISC-V processors

机译:PULP-NN:在平行的超低功耗RISC-V处理器上加速量化的神经网络

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We present PULP-NN, an optimized computing library for a parallel ultra-low-power tightly coupled cluster of RISC-V processors. The key innovation in PULP-NN is a set of kernels for quantized neural network inference, targeting byte and subbyte data types, down to INT-1, tuned for the recent trend toward aggressive quantization in deep neural network inference. The proposed library exploits both the digital signal processing extensions available in the PULP RISC-V processors and the cluster's parallelism, achieving up to 15.5 MACs/cycle on INT-8 and improving performance by up to 63× with respect to a sequential implementation on a single RISC-V core implementing the baseline RV32IMC ISA. Using PULP-NN, a CIFAR-10 network on an octa-core cluster runs in 30× and 19.6× less clock cycles than the current state-of-the-art ARM CMSISNN library, running on STM32L4 and STM32H7 MCUs, respectively. The proposed library, when running on a GAP-8 processor, outperforms by 36.8× and by 7.45× the execution on energy efficient MCUs such as STM32L4 and high-end MCUs such as STM32H7 respectively, when operating at the maximum frequency. The energy efficiency on GAP-8 is 14.1× higher than STM32L4 and 39.5× higher than STM32H7, at the maximum efficiency operating point. This article is part of the theme issue 'Harmonizing energy-autonomous computing and intelligence'.
机译:我们呈现PURP-NN,一个用于平行的超低功耗紧密耦合的RISC-V处理器集群的优化计算库。 PURP-NN的关键创新是一组用于量化神经网络推理的核,瞄准字节和子系统数据类型,降至INT-1,为最近的深度神经网络推理中的攻击量化趋势而调整。该拟议的库利用纸浆RISC-V处理器和群集的并行性中提供的数字信号处理扩展,在INT-8上实现高达15.5 Mac /循环,并在A的顺序实现上提高了最多63倍的性能单个RISC-V核心实施基线RV32IMC ISA。使用PURP-NN,Octa-Core集中的CiFar-10网络以30×和19.6倍的时钟周期运行,而不是当前最先进的ARM CMSISNN库,分别在STM32L4和STM32H7 MCU上运行。所提出的库,在GAP-8处理器上运行时,在最大频率下运行时,在节点8处理器上运行36.8倍,并且在节能MCU上执行7.45倍,例如STM32L4和高端MCU,如STM32H7。在最大效率操作点处,GAP-8上的能量效率比STM32L4高于STM32L4和39.5倍,高于STM32H7。本文是主题问题“协调能源自主计算和智能”的一部分。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号