首页> 外文OA文献 >Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
【2h】

Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks

机译:Eyeriss:用于卷积神经网络的节能数据流的空间架构

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Deep convolutional neural networks (CNNs) are widely used in modern AI systems for their superior accuracy but at the cost of high computational complexity. The complexity comes from the need to simultaneously process hundreds of filters and channels in the high-dimensional convolutions, which involve a significant amount of data movement. Although highly-parallel compute paradigms, such as SIMD/SIMT, effectively address the computation requirement to achieve high throughput, energy consumption still remains high as data movement can be more expensive than computation. Accordingly, finding a dataflow that supports parallel processing with minimal data movement cost is crucial to achieving energy-efficient CNN processing without compromising accuracy. In this paper, we present a novel dataflow, called row-stationary (RS), that minimizes data movement energy consumption on a spatial architecture. This is realized by exploiting local data reuse of filter weights and feature map pixels, i.e., activations, in the high-dimensional convolutions, and minimizing data movement of partial sum accumulations. Unlike dataflows used in existing designs, which only reduce certain types of data movement, the proposed RS dataflow can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine (PE) local storage, direct inter-PE communication and spatial parallelism. To evaluate the energy efficiency of the different dataflows, we propose an analysis framework that compares energy cost under the same hardware area and processing parallelism constraints. Experiments using the CNN configurations of AlexNet show that the proposed RS dataflow is more energy efficient than existing dataflows in both convolutional (1.4x to 2.5x) and fully-connected layers (at least 1.3x for batch size larger than 16). The RS dataflow has also been demonstrated on a fabricated chip, which verifies our energy analysis.
机译:深度卷积神经网络(CNN)以其卓越的准确性而在现代AI系统中被广泛使用,但以高计算复杂度为代价。复杂性源于需要同时处理高维卷积中的数百个滤波器和通道,这涉及大量数据移动。尽管高度并行的计算范例(例如SIMD / SIMT)有效地满足了实现高吞吐量的计算要求,但是能耗仍然很高,因为数据移动可能比计算更昂贵。因此,找到以最小的数据移动成本支持并行处理的数据流对于实现节能的CNN处理而不影响精度至关重要。在本文中,我们提出了一种新颖的数据流,称为行固定(RS),可将空间架构上的数据移动能耗降至最低。这是通过在高维卷积中利用滤波器​​权重和特征图像素(即激活)的本地数据重用,并最大限度地减少部分和累加的数据移动来实现的。与现有设计中使用的数据流仅减少某些类型的数据移动不同,拟议的RS数据流可以适应不同的CNN形状配置,并通过最大程度地利用处理引擎(PE)本地存储,直接在PE间进行传输来减少所有类型的数据移动通信和空间并行性。为了评估不同数据流的能源效率,我们提出了一个分析框架,该框架比较了在相同硬件区域和处理并行性约束下的能源成本。使用AlexNet的CNN配置进行的实验表明,在卷积层(1.4x到2.5x)和完全连接的层中(批量大小大于16时至少为1.3x),建议的RS数据流比现有数据流更节能。 RS数据流也已经在装配好的芯片上进行了演示,这验证了我们的能量分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号