首页> 外文期刊>Solid-State Circuits, IEEE Journal of >A Sparse Coding Neural Network ASIC With On-Chip Learning for Feature Extraction and Encoding
【24h】

A Sparse Coding Neural Network ASIC With On-Chip Learning for Feature Extraction and Encoding

机译:带有片上学习的稀疏编码神经网络ASIC,用于特征提取和编码

获取原文
获取原文并翻译 | 示例
       

摘要

Hardware-based computer vision accelerators will be an essential part of future mobile devices to meet the low power and real-time processing requirement. To realize a high energy efficiency and high throughput, the accelerator architecture can be massively parallelized and tailored to vision processing, which is an advantage over software-based solutions and general-purpose hardware. In this work, we present an ASIC that is designed to learn and extract features from images and videos. The ASIC contains 256 leaky integrate-and-fire neurons connected in a scalable two-layer network of 88 grids linked in a 4-stage ring. Sparse neuron activation and the relatively small grid keep the spike collision probability low to save access arbitration. The weight memory is divided into core memory and auxiliary memory, such that the auxiliary memory is only powered on for learning to save inference power. High-throughput inference is accomplished by the parallel operation of neurons. Efficient learning is implemented by passing parameter update messages, which is further simplified by an approximation technique. A 3.06 mm 65 nm CMOS ASIC test chip is designed to achieve a maximum inference throughput of 1.24 Gpixel/s at 1.0 V and 310 MHz, and on-chip learning can be completed in seconds. To improve the power consumption and energy efficiency, core memory supply voltage can be reduced to 440 mV to take advantage of the error resilience of the algorithm, reducing the inference power to 6.67 mW for a 140 Mpixel/s throughput at 35 MHz.
机译:基于硬件的计算机视觉加速器将成为未来移动设备中满足低功耗和实时处理要求的重要组成部分。为了实现高能源效率和高吞吐量,可以对加速器体系结构进行大规模并行化并针对视觉处理进行定制,这是基于软件的解决方案和通用硬件的优势。在这项工作中,我们提出了一种ASIC,该ASIC旨在学习和提取图像和视频中的特征。 ASIC包含256个泄漏的集成点火神经元,这些神经元连接在一个可扩展的两层网络中,该网络由一个4级环相连的88个网格组成。稀疏的神经元激活和相对较小的网格使尖峰碰撞概率较低,以节省访问仲裁。权重存储器分为核心存储器和辅助存储器,因此辅助存储器仅在通电时学习以节省推理能力。高通量推理是通过神经元的并行操作来完成的。通过传递参数更新消息可以实现有效的学习,通过近似技术可以进一步简化该过程。设计了3.06 mm 65 nm CMOS ASIC测试芯片,以在1.0 V和310 MHz时实现1.24 Gpixel / s的最大推理吞吐量,并且可以在几秒钟内完成片上学习。为了提高功耗和能效,可以将核心存储器电源电压降低到440 mV,以利用算法的错误恢复能力,在35 MHz时将140 Mpixel / s的吞吐量的推理功率降低到6.67 mW。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号