首页> 外文会议>IEEE International Symposium on Circuits and Systems >ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator
【24h】

ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator

机译:Chewbaccann:灵活的223顶部/ W BNN加速器

获取原文

摘要

Binary Neural Networks enable smart IoT devices, as they significantly reduce the required memory footprint and computational complexity while retaining a high network performance and flexibility. This paper presents ChewBaccaNN, a 0.7mm2 sized binary convolutional neural network (CNN) accelerator designed in GlobalFoundries 22nm technology. By exploiting efficient data re-use, data buffering, latch-based memories, and voltage scaling, a throughput of 241 GOPS is achieved while consuming just 1.1mW at 0.4V/154MHz during inference of binary CNNs with up to 7×7 kernels, leading to a peak core energy efficiency of 223TOPS/W. ChewBaccaNN’s flexibility allows to run a much wider range of binary CNNs than other accelerators, drastically improving the accuracy-energy trade-off beyond what can be captured by the TOPS/W metric. In fact, it can perform CIFAR-10 inference at 86.8% accuracy with merely 1.3µJ, thus exceeding the accuracy while at the same time lowering the energy cost by 2.8× compared to even the most efficient and much larger analog processing-in-memory devices, while keeping the flexibility of running larger CNNs for higher accuracy when needed. It also runs a binary ResNet-18 trained on the 1000-class ILSVRC dataset and improves the energy efficiency by 4.4× over accelerators of similar flexibility. Furthermore, it can perform inference on a binarized ResNet-18 trained with 8-bases Group-Net to achieve a 67.5% Top-1 accuracy with only 3.0mJ/frame—at an accuracy drop of merely 1.8% from the full-precision ResNet-18.
机译:二元神经网络使智能物联网设备能够显着降低所需的内存占用和计算复杂性,同时保留高网络性能和灵活性。本文介绍了Chewbaccann,0.7mm 2 尺寸的二元卷积神经网络(CNN)加速器设计在GlobalFoundries 22nm技术中。通过利用高效的数据重复使用,数据缓冲,基于闩锁的存储和电压缩放,在二进制CNN的推理期间在0.4V / 154MHz中消耗的241个GOP的吞吐量,在二进制CNN的推理中,高达7×7内核,导致峰值核心能效223tops / w。 Chewbaccann的灵活性允许比其他加速器运行更广泛的二进制CNN,大大提高了超出顶部/ W度量捕获的精度 - 能量折衷。事实上,它可以以86.8%的准确性执行CiFar-10推理,仅限为1.3μJ,从而超出精度,同时将能量成本降低2.8×相比,即使是最有效和更大的模拟处理内存设备,同时在需要时保持更高的CNN的灵活性以更高的精度运行。它还在1000级ILSVRC数据集上运行二进制Resnet-18,并通过相似灵活性的加速器提高了4.4倍的能量效率。此外,它可以对具有8个基团组培训的二值化Reset-18上的推断,以实现67.5%的前-1顶级精度,只有3.0MJ /帧,仅为Precision Reset的精度下降1.8% -18。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号