首页> 外文会议>Design, Automation and Test in Europe Conference and Exhibition >Learn-to-Scale: Parallelizing Deep Learning Inference on Chip Multiprocessor Architecture

【24h】

Learn-to-Scale: Parallelizing Deep Learning Inference on Chip Multiprocessor Architecture

机译：按比例学习：并行处理芯片多处理器体系结构上的深度学习推理

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Accelerating deep neural networks on resource-constrained embedded devices is becoming increasingly important for real-time applications. However, in contrast to the intensive research works on specialized neural network inference architectures, there is a lack of study on the acceleration and parallelization of deep learning inference on embedded chip-multiprocessor architectures, which are favored by many real-time applications for superb energy-efficiency and scalability. In this work, we investigate the strategies of parallelizing single-pass deep neural network inference on embedded on-chip multi-core accelerators. These methods exploit the elasticity and noise-tolerance features of deep learning algorithms to circumvent the bottleneck of on-chip inter-core data moving and reduce the communication overhead aggravated as the core number scales up. The experimental results show that the communication-aware sparsified parallelization method improves the system performance by 1.6×-1.1× and achieves 4×-1.6× better interconnects energy efficiency for different neural networks.

机译：在资源受限的嵌入式设备上加速深度神经网络对于实时应用变得越来越重要。但是，与对专门的神经网络推理体系结构的深入研究相反，缺乏对嵌入式芯片-多处理器体系结构上的深度学习推理的加速和并行化的研究，这受到许多实时应用程序的青睐。效率和可伸缩性。在这项工作中，我们研究了在嵌入式片上多核加速器上并行化单通深度神经网络推理的策略。这些方法利用深度学习算法的弹性和噪声容忍功能来规避芯片上内核间数据移动的瓶颈，并减少随着内核数量的增加而加剧的通信开销。实验结果表明，基于通信的稀疏并行化方法将系统性能提高了1.6×-1.1×，并为不同的神经网络实现了4×-1.6×更好的互连能效。

著录项

来源
《Design, Automation and Test in Europe Conference and Exhibition 》|2019年|1172-1177|共6页
会议地点
作者
Kaiwei Zou; Ying Wang; Huawei Li; Xiaowei Li;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Multicore processing; Neural networks; Parallel processing; Kernel; Acceleration; Training;

机译：多核处理;神经网络;并行处理;内核;加速;培训;

相似文献

外文文献
中文文献
专利

1. Simba: Scaling Deep-Learning Inference with Chiplet-Based Architecture [J] . Shao Yakun Sophia, Cemons Jason, Venkatesan Rangharajan, Communications of the ACM . 2021 ,第6期

机译：SIMBA：使用基于小杉的架构进行深度学习推断
2. An Energy-Efficient and Scalable Deep Learning/Inference Processor With Tetra-Parallel MIMD Architecture for Big Data Applications [J] . Park Seong-Wook, Park Junyoung, Bong Kyeongryeol, Biomedical Circuits and Systems, IEEE Transactions on . 2015 ,第6期

机译：具有Te-Parallel MIMD架构的节能高效且可扩展的深度学习/推理处理器，适用于大数据应用
3. Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors [J] . Sanjeev Kumar, Christopher J. Hughes, Anthony Nguyen Computer architecture news . 2007 ,第2期

机译：Carbon：片上多处理器并行处理的架构支持
4. Learn-to-Scale: Parallelizing Deep Learning Inference on Chip Multiprocessor Architecture [C] . Kaiwei Zou, Ying Wang, Huawei Li, Design, Automation amp;amp;amp; Test in Europe Conference amp;amp;amp; Exhibition . 2019

机译：学习级别：对芯片多处理器架构的深度学习推断并行化
5. Evaluating the scalability of SDF single-chip multiprocessor architecture using automatically parallelizing code. [D] . Zhang, Yuhua. 2004

机译：使用自动并行化代码评估SDF单芯片多处理器体系结构的可伸缩性。
6. A probabilistic approach to learn chromatin architecture and accurate inference of the NF-κB/RelA regulatory network using ChIP-Seq [O] . Jun Yang, Abhishek Mitra, Norbert Dojer, 2013

机译：一种使用ChIP-Seq学习染色质结构和准确推断NF-κB/ RelA调控网络的概率方法
7. Improving the Performance of Parallel Applications in Chip Multiprocessors with Architectural Techniques [O] . Jahre Magnus 2007

机译：利用架构技术提高芯片多处理器并行应用程序的性能
8. Parallel Simulation of Chip-Multiprocessor Architectures [R] . Chidester, M. C. , George, A. D. 2002

机译：芯片 - 多处理器架构的并行仿真

Learn-to-Scale: Parallelizing Deep Learning Inference on Chip Multiprocessor Architecture

摘要

著录项

相似文献

相关主题

期刊订阅