首页> 外文会议>ACM/EDAC/IEEE Design Automation Conference >C-Brain: A Deep Learning Accelerator that Tames the Diversity of CNNs through Adaptive Data-level Parallelization
【24h】

C-Brain: A Deep Learning Accelerator that Tames the Diversity of CNNs through Adaptive Data-level Parallelization

机译:C-Brain:深入学习加速器,通过自适应数据级并行化作出CNN的多样性

获取原文

摘要

Convolutional neural networks (CNN) accelerators have been proposed as an efficient hardware solution for deep learning based applications, which are known to be both compute-and-memory intensive. Although the most advanced CNN accelerators can deliver high computational throughput, the performance is highly unstable. Once changed to accommodate a new network with different parameters like layers and kernel size, the fixed hardware structure, may no longer well match the data flows. Consequently, the accelerator will fail to deliver high performance due to the underutilization of either logic resource or memory bandwidth. To overcome this problem, we proposed a novel deep learning accelerator, which offers multiple types of data-level parallelism: inter-kernel, intra-kernel and hybrid. Our design can adaptively switch among the three types of parallelism and the corresponding data tiling schemes to dynamically match different networks or even different layers of a single network. No matter how we change the hardware configurations or network types, the proposed network mapping strategy ensures the optimal performance and energy-efficiency. Compared with previous state-of-the-art NN accelerators, it is possible to achieve a speedup of 4.0×-8.3× for some layers of the well-known large scale CNNs. For the whole phase of network forward-propagation, our design achieves 28.04% PE energy saving, 90.3% on-chip memory energy saving on average.
机译:已经提出了卷积神经网络(CNN)加速器作为基于深度学习的应用的有效硬件解决方案,已知是计算和记忆密集型。虽然最先进的CNN加速器可以提供高计算吞吐量,但性能非常不稳定。一旦更改为适应具有不同参数的新网络,如图层和内核大小,固定硬件结构可能不再匹配数据流。因此,由于逻辑资源或内存带宽的未充分利用,加速器将无法提供高性能。为了克服这个问题,我们提出了一种新颖的深度学习加速器,它提供多种类型的数据级并行度:内核间,内核和混合。我们的设计可以自适应地切换三种类型的并行性和相应的数据平铺方案,以动态匹配不同的网络甚至不同网络的不同网络。无论我们如何改变硬件配置或网络类型,所提出的网络映射策略都可确保最佳性能和节能。与以前的最先进的NN加速器相比,对于众所周知的大规模CNN的一些层,可以实现4.0×-8.3×的加速。在网络前进传播的整个阶段,我们的设计达到了28.04%的PE节能,平均水平节能90.3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号