首页> 外文会议>IEEE International Workshop on Signal Processing Systems >Mapping Systolic Arrays onto 3D Circuit Structures: Accelerating Convolutional Neural Network Inference
【24h】

Mapping Systolic Arrays onto 3D Circuit Structures: Accelerating Convolutional Neural Network Inference

机译:将脉动阵列映射到3D电路结构:加速卷积神经网络推理

获取原文

摘要

In recent years, numerous designs have used systolic arrays to accelerate convolutional neural network (CNN) inference. In this work, we demonstrate that we can further speed up CNN inference and lower its power consumption by mapping systolic arrays onto 3D circuit structures as opposed to conventional 2D structures. Specifically, by operating in 3D space, a wide systolic array consisting of a number of subarrays can efficiently implement wide convolutional layers prevalent in state of the art CNNs. Additionally, by accumulating intermediate results along the third dimension, systolic arrays can process partitioned data channels in parallel with reduced data skew for lowered inference latency. We present a building block design using through-silicon vias (TSVs) for the 3D realization of systolic subarrays. We validate the 3D scheme using a 2.5D FPGA design and demonstrate that when mapped onto 3D structures wide systolic arrays can scale up in size without increasing wiring length in interconnecting subarrays. Further, by taking full advantage of 3D structures, we are able to pipeline inference across multiple layers of a CNN over a series of systolic arrays, dramatically reducing the inference time per input sample. These improvements lead to significantly reduced inference latency, which is especially important for real-time applications where it is common to process samples one at a time.
机译:近年来,许多设计已经使用收缩阵列加速卷积神经网络(CNN)推断。在这项工作中,我们证明我们可以通过将收缩阵列映射到3D电路结构而不是传统的2D结构来进一步加速CNN推断并降低其功耗。具体地,通过在3D空间中操作,由多个子阵列组成的宽的收缩系统阵列可以有效地实现普遍存在的现有技术CNNS中的宽卷积层。另外,通过沿着第三尺寸累积中间结果,收缩阵列可以并行地处理分区数据信道,其用于降低的推断延迟的减少的数据偏斜。我们使用通过硅通孔(TSV)来介绍一个构建块设计,用于收缩系统子阵列的3D实现。我们使用2.5D FPGA设计验证3D方案,并证明当映射到3D结构时,宽的收缩阵列可以尺寸扩展而不增加互连子阵列中的布线长度。此外,通过充分利用3D结构,我们能够在一系列收缩阵列上跨多个CNN的多层推断,从而大大减少每个输入样本的推理时间。这些改进导致显着减少推断延迟,这对于一个一次常见的实时应用尤为重要,这是一次处理样品的常见应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号