首页> 外文会议>ACM/EDAC/IEEE Design Automation Conference >C-Brain: A Deep Learning Accelerator that Tames the Diversity of CNNs through Adaptive Data-level Parallelization

【24h】

C-Brain: A Deep Learning Accelerator that Tames the Diversity of CNNs through Adaptive Data-level Parallelization

机译：C-Brain：深入学习加速器，通过自适应数据级并行化作出CNN的多样性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Convolutional neural networks (CNN) accelerators have been proposed as an efficient hardware solution for deep learning based applications, which are known to be both compute-and-memory intensive. Although the most advanced CNN accelerators can deliver high computational throughput, the performance is highly unstable. Once changed to accommodate a new network with different parameters like layers and kernel size, the fixed hardware structure, may no longer well match the data flows. Consequently, the accelerator will fail to deliver high performance due to the underutilization of either logic resource or memory bandwidth. To overcome this problem, we proposed a novel deep learning accelerator, which offers multiple types of data-level parallelism: inter-kernel, intra-kernel and hybrid. Our design can adaptively switch among the three types of parallelism and the corresponding data tiling schemes to dynamically match different networks or even different layers of a single network. No matter how we change the hardware configurations or network types, the proposed network mapping strategy ensures the optimal performance and energy-efficiency. Compared with previous state-of-the-art NN accelerators, it is possible to achieve a speedup of 4.0×-8.3× for some layers of the well-known large scale CNNs. For the whole phase of network forward-propagation, our design achieves 28.04% PE energy saving, 90.3% on-chip memory energy saving on average.

机译：已经提出了卷积神经网络（CNN）加速器作为基于深度学习的应用的有效硬件解决方案，已知是计算和记忆密集型。虽然最先进的CNN加速器可以提供高计算吞吐量，但性能非常不稳定。一旦更改为适应具有不同参数的新网络，如图层和内核大小，固定硬件结构可能不再匹配数据流。因此，由于逻辑资源或内存带宽的未充分利用，加速器将无法提供高性能。为了克服这个问题，我们提出了一种新颖的深度学习加速器，它提供多种类型的数据级并行度：内核间，内核和混合。我们的设计可以自适应地切换三种类型的并行性和相应的数据平铺方案，以动态匹配不同的网络甚至不同网络的不同网络。无论我们如何改变硬件配置或网络类型，所提出的网络映射策略都可确保最佳性能和节能。与以前的最先进的NN加速器相比，对于众所周知的大规模CNN的一些层，可以实现4.0×-8.3×的加速。在网络前进传播的整个阶段，我们的设计达到了28.04％的PE节能，平均水平节能90.3％。

著录项

来源
《ACM/EDAC/IEEE Design Automation Conference》|2016年|505-1035p|共6页
会议地点
作者
Lili Song; Ying Wang; Yinhe Han; Xin Zhao; Bosheng Liu; Xiaowei Li;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP2-53;
关键词

相似文献

外文文献
中文文献
专利

1. The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs With Hybrid Parallelism [J] . Oyama Yosuke, Maruyama Naoya, Dryden Nikoli, IEEE Transactions on Parallel and Distributed Systems . 2021,第7期

机译：深度学习强度缩放的案例：用混合并行性训练大3D CNN
2. pLoc_Deep-mVirus: A CNN Model for Predicting Subcellular Localization of Virus Proteins by Deep Learning [J] . Yutao Shao, Kuo-Chen Chou Advances in Biological Chemistry . 2020,第6期

机译：ploc_deep-mvirus：通过深入学习预测病毒蛋白亚细胞定位的CNN模型
3. pLoc_Deep-mVirus: A CNN Model for Predicting Subcellular Localization of Virus Proteins by Deep Learning [J] . Yutao Shao, Kuo-Chen Chou Natural science . 2020,第6期

机译：ploc_deep-mvirus：通过深入学习预测病毒蛋白亚细胞定位的CNN模型
4. C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization [C] . Lili Song, Ying Wang, Yinhe Han, ACM/EDAC/IEEE Design Automation Conference . 2016

机译：C-Brain：一种深度学习加速器，通过自适应数据级并行化来驯服CNN的多样性
5. Secure Deep Learning Accelerators [D] . Mera Collantes, Maria I. 2021

机译：安全深受学习加速器
6. RAC-CNN: multimodal deep learning based automatic detection and classification of rod and cone photoreceptors in adaptive optics scanning light ophthalmoscope images [O] . David Cunefare, Alison L. Huckenpahler, Emily J. Patterson, 2019

机译：RAC-CNN：基于多模式深度学习的自适应光学扫描光学检眼镜图像中杆和锥感光体的自动检测和分类
7. Adaptive Deep Learning for Time-Varying Systems With Hidden Parameters: Predicting Changing Input Beam Distributions of Compact Particle Accelerators [O] . Alexander Scheinker, Frederick Cropp, Sergio Paiagua, 2021

机译：具有隐藏参数的时变系统的自适应深度学习：预测压缩粒子加速器的变化输入光束分布

C-Brain: A Deep Learning Accelerator that Tames the Diversity of CNNs through Adaptive Data-level Parallelization

摘要

著录项

相似文献

相关主题

期刊订阅