首页> 外文会议>IEEE International Solid- State Circuits Conference >Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications
【24h】

Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications

机译:Conv-RAM:一种具有嵌入式卷积计算功能的节能SRAM,适用于基于低功耗CNN的机器学习应用

获取原文

摘要

Convolutional neural networks (CNN) provide state-of-the-art results in a wide variety of machine learning (ML) applications, ranging from image classification to speech recognition. However, they are very computationally intensive and require huge amounts of storage. Recent work strived towards reducing the size of the CNNs: [1] proposes a binary-weight-network (BWN), where the filter weights (wi's) are ±1 (with a common scaling factor per filter: α). This leads to a significant reduction in the amount of storage required for the Wi's, making it possible to store them entirely on-chip. However, in a conventional all-digital implementation [2, 3], reading the wjis and the partial sums from the embedded SRAMs require a lot of data movement per computation, which is energy-hungry. To reduce data-movement, and associated energy, we present an SRAM-embedded convolution architecture (Fig. 31.1.1), which does not require reading the wi's explicitly from the memory. Prior work on embedded ML classifiers have focused on 1b outputs [4] or a small number of output classes [5], both of which are not sufficient for CNNs. This work uses 7b inputs/outputs, which is sufficient to maintain good accuracy for most of the popular CNNs [1]. The convolution operation is implemented as voltage averaging (Fig. 31.1.1), since the wj's are binary, while the averaging factor (1/N) implements the weight-coefficient α (with a new scaling factor, M, implemented off-chip).
机译:卷积神经网络(CNN)可在从图像分类到语音识别的各种机器学习(ML)应用程序中提供最新结果。但是,它们的计算量很大,并且需要大量的存储空间。最近的工作致力于减少CNN的大小:[1]提出了一种二进制加权网络(BWN),其中过滤器权重(w i )为±1(每个滤波器共用一个比例因子:α)。这大大减少了W所需的存储量 i ,因此可以将它们完全存储在芯片上。但是,在常规的全数字实现中[2,3],读取wj i s和嵌入式SRAM的部分和每次计算需要大量数据移动,这非常耗能。为了减少数据移动和相关的能量,我们提出了一种SRAM嵌入式卷积架构(图31.1.1),该结构不需要读取w i 明确地从内存中获取。嵌入式ML分类器的先前工作主要集中在1b输出[4]或少量输出类别[5]上,这两种输出对于CNN都是不够的。这项工作使用7b输入/输出,足以对大多数流行的CNN保持良好的准确性[1]。由于wj是二进制的,所以卷积运算被实现为电压平均(图31.1.1),而平均因子(1 / N)实现了权重系数α(新的比例因子M则在芯片外实现) )。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号