Mapping Systolic Arrays onto 3D Circuit Structures: Accelerating Convolutional Neural Network Inference

机译：将脉动阵列映射到3D电路结构：加速卷积神经网络推理

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In recent years, numerous designs have used systolic arrays to accelerate convolutional neural network (CNN) inference. In this work, we demonstrate that we can further speed up CNN inference and lower its power consumption by mapping systolic arrays onto 3D circuit structures as opposed to conventional 2D structures. Specifically, by operating in 3D space, a wide systolic array consisting of a number of subarrays can efficiently implement wide convolutional layers prevalent in state of the art CNNs. Additionally, by accumulating intermediate results along the third dimension, systolic arrays can process partitioned data channels in parallel with reduced data skew for lowered inference latency. We present a building block design using through-silicon vias (TSVs) for the 3D realization of systolic subarrays. We validate the 3D scheme using a 2.5D FPGA design and demonstrate that when mapped onto 3D structures wide systolic arrays can scale up in size without increasing wiring length in interconnecting subarrays. Further, by taking full advantage of 3D structures, we are able to pipeline inference across multiple layers of a CNN over a series of systolic arrays, dramatically reducing the inference time per input sample. These improvements lead to significantly reduced inference latency, which is especially important for real-time applications where it is common to process samples one at a time.

机译：近年来，许多设计已经使用收缩阵列加速卷积神经网络（CNN）推断。在这项工作中，我们证明我们可以通过将收缩阵列映射到3D电路结构而不是传统的2D结构来进一步加速CNN推断并降低其功耗。具体地，通过在3D空间中操作，由多个子阵列组成的宽的收缩系统阵列可以有效地实现普遍存在的现有技术CNNS中的宽卷积层。另外，通过沿着第三尺寸累积中间结果，收缩阵列可以并行地处理分区数据信道，其用于降低的推断延迟的减少的数据偏斜。我们使用通过硅通孔（TSV）来介绍一个构建块设计，用于收缩系统子阵列的3D实现。我们使用2.5D FPGA设计验证3D方案，并证明当映射到3D结构时，宽的收缩阵列可以尺寸扩展而不增加互连子阵列中的布线长度。此外，通过充分利用3D结构，我们能够在一系列收缩阵列上跨多个CNN的多层推断，从而大大减少每个输入样本的推理时间。这些改进导致显着减少推断延迟，这对于一个一次常见的实时应用尤为重要，这是一次处理样品的常见应用。

著录项

来源
《IEEE International Workshop on Signal Processing Systems》|2018年|330-336|共7页
会议地点
作者
H. T. Kung; Bradley McDanel; Sai Qian Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Arrays; Three-dimensional displays; Wires; Two dimensional displays; Convolution; Through-silicon vias; Physical layer;

机译：阵列;三维显示;导线;二维显示;卷积;直通硅通孔;物理层;

相似文献

外文文献
中文文献
专利

1. CSTAT +: A GPU-accelerated spatial pattern analysis algorithm for high-resolution 2D/3D hydrologic connectivity using array vectorization and convolutional neural network operators [J] . Yu Feng, Harbor Jonathan M. Environmental Modelling & Software . 2019,第Octa期

机译：CSTAT +：使用阵列矢量化和卷积神经网络运算符的高分辨率2D / 3D水文连通性的GPU加速空间模式分析算法
2. CSTAT +: A GPU-accelerated spatial pattern analysis algorithm for high-resolution 2D/3D hydrologic connectivity using array vectorization and convolutional neural network operators [J] . Yu Feng, Harbor Jonathan M. Environmental Modelling & Software . 2019,第OCTa期

机译：CSTAT +：使用阵列矢量化和卷积神经网络运算符的高分辨率2D / 3D水文连通性的GPU加速空间模式分析算法
3. RNA3DCNN: Local and global quality assessments of RNA 3D structures using 3D deep convolutional neural networks [J] . Jun Li, Wei Zhu, Jun Wang, PLoS Computational Biology . 2018,第11期

机译：RNA3DCNN：使用3D深度卷积神经网络对RNA 3D结构进行局部和全局质量评估
4. Mapping Systolic Arrays onto 3D Circuit Structures: Accelerating Convolutional Neural Network Inference [C] . H. T. Kung, Bradley McDanel, Sai Qian Zhang IEEE International Workshop on Signal Processing Systems . 2018

机译：将收缩阵列映射到3D电路结构上：加速卷积神经网络推断
5. Inception of a Cyber-Infrastructure for Product Design Data and Evaluation of a Customized Multi-View Convolutional Neural Network for 3D CAD Model Classification [D] . Bharadwaj, Akshay Ganesh. 2020

机译：用于产品设计数据的网络基础设施和用于3D CAD模型分类的定制多视图卷积神经网络的评估
6. RNA3DCNN: Local and global quality assessments of RNA 3D structures using 3D deep convolutional neural networks [O] . Jun Li, Wei Zhu, Jun Wang, 2018

机译：RNA3DCNN：使用3D深度卷积神经网络对RNA 3D结构进行局部和全局质量评估
7. Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing [O] . Akshay Dua, Yixing Li, Fengbo Ren 2020

机译：Systolic-CNN：用于在云/边缘计算中加速卷积神经网络推断的OpenCL定义可伸缩的运行时柔性FPGA加速器架构

Mapping Systolic Arrays onto 3D Circuit Structures: Accelerating Convolutional Neural Network Inference

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅