首页> 外文会议>IEEE International Solid- State Circuits Conference >7.5 A 65nm 0.39-to-140.3TOPS/W 1-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1 × Higher TOPS/mm2and 6T HBST-TRAM-Based 2D Data-Reuse Architecture
【24h】

7.5 A 65nm 0.39-to-140.3TOPS/W 1-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1 × Higher TOPS/mm2and 6T HBST-TRAM-Based 2D Data-Reuse Architecture

机译:7.5 A 65nm 0.39至140.3TOPS / W 1至12b统一神经网络处理器,采用启用块循环的转置域加速度为8.1×更高的TOPS / mm 2 和6T HBST-TRAM基于2D的数据重用架构

获取原文
获取外文期刊封面目录资料

摘要

Energy-efficient neural-network (NN) processors have been proposed for battery-powered deep-learning applications, where convolutional (CNN), fully-connected (FC) and recurrent NNs (RNN) are three major workloads. To support all of them, previous solutions [1-3] use either area-inefficient heterogeneous architectures, including CNN and RNN cores, or an energy-inefficient reconfigurable architecture. A block-circulant algorithm [4] can unify CNN/FC/RNN workloads with transpose-domain acceleration, as shown in Fig. 7.5.1. Once NN weights are trained using the block-circulant pattern, all workloads are transformed into consistent matrix-vector multiplications (MVM), which can potentially achieve 8 to-128× storage savings and a O(n2)-to-O(nlog(n)) computation complexity reduction.
机译:高效的神经网络(NN)处理器已被提出用于电池供电的深度学习应用,其中卷积(CNN),全连接(FC)和递归NN(RNN)是三大工作量。为了支持所有这些,以前的解决方案[1-3]使用面积低效率的异构体系结构(包括CNN和RNN内核),或能源效率低的可重新配置体系结构。块循环算法[4]可以通过转置域加速来统一CNN / FC / RNN工作负载,如图7.5.1所示。使用块循环模式训练NN权重后,所有工作负载都将转换为一致的矩阵向量乘法(MVM),这可以潜在地实现8到128倍的存储节省和O(n 2 到O(nlog(n))的计算复杂度降低。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号