首页> 外文会议>Design, Automation and Test in Europe Conference and Exhibition >Learn-to-Scale: Parallelizing Deep Learning Inference on Chip Multiprocessor Architecture
【24h】

Learn-to-Scale: Parallelizing Deep Learning Inference on Chip Multiprocessor Architecture

机译:按比例学习:并行处理芯片多处理器体系结构上的深度学习推理

获取原文

摘要

Accelerating deep neural networks on resource-constrained embedded devices is becoming increasingly important for real-time applications. However, in contrast to the intensive research works on specialized neural network inference architectures, there is a lack of study on the acceleration and parallelization of deep learning inference on embedded chip-multiprocessor architectures, which are favored by many real-time applications for superb energy-efficiency and scalability. In this work, we investigate the strategies of parallelizing single-pass deep neural network inference on embedded on-chip multi-core accelerators. These methods exploit the elasticity and noise-tolerance features of deep learning algorithms to circumvent the bottleneck of on-chip inter-core data moving and reduce the communication overhead aggravated as the core number scales up. The experimental results show that the communication-aware sparsified parallelization method improves the system performance by 1.6×-1.1× and achieves 4×-1.6× better interconnects energy efficiency for different neural networks.
机译:在资源受限的嵌入式设备上加速深度神经网络对于实时应用变得越来越重要。但是,与对专门的神经网络推理体系结构的深入研究相反,缺乏对嵌入式芯片-多处理器体系结构上的深度学习推理的加速和并行化的研究,这受到许多实时应用程序的青睐。效率和可伸缩性。在这项工作中,我们研究了在嵌入式片上多核加速器上并行化单通深度神经网络推理的策略。这些方法利用深度学习算法的弹性和噪声容忍功能来规避芯片上内核间数据移动的瓶颈,并减少随着内核数量的增加而加剧的通信开销。实验结果表明,基于通信的稀疏并行化方法将系统性能提高了1.6×-1.1×,并为不同的神经网络实现了4×-1.6×更好的互连能效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号