Fully Distributed Deep Learning Inference on Resource-Constrained Edge Devices

机译：资源受限的边缘设备上的完全分布式深度学习推理

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Performing inference tasks of deep learning applications on IoT edge devices ensures privacy of input data and can result in shorter latency when compared to a cloud solution. As most edge devices are memory- and compute-constrained, they cannot store and execute a complete Deep Neural Network (DNN). One possible solution is to distribute the DNN across multiple edge devices. For a complete distribution, both fully-connected and feature- and weight-intensive convolutional layers need to be partitioned to reduce the amount of computation and data on each resource-constrained edge device. At the same time, resulting communication overheads need to be considered. Existing work on distributed DNN execution can not support all types of networks and layers or does not account for layer fusion opportunities to reduce communication. In this paper, we jointly optimize memory, computation and communication demands for distributed execution of complete neural networks covering all layers. This is achieved through techniques that combine both feature and weight partitioning with a communication-aware layer fusion approach to enable holistic optimization across layers. For a given number of edge devices, the schemes are applied jointly such that the amount of data to be exchanged between devices is minimized to optimize run time. Experimental results for a simulation of six edge devices on 100 Mbit connections running the YOLOv2 DNN model show that the schemes evenly balance the memory footprint between devices. The integration of layer fusion additionally leads to a reduction of communication demands by 14.8%. This results in run time speed-up of the inference task by 1.15x compared to partitioning without fusing.

机译：与云解决方案相比，在IoT边缘设备上执行深度学习应用程序的推理任务可确保输入数据的私密性，并可缩短等待时间。由于大多数边缘设备受内存和计算限制，因此它们无法存储和执行完整的深度神经网络（DNN）。一种可能的解决方案是将DNN分布在多个边缘设备上。对于完整的分发，需要对完全连接的卷积层和特征密集卷积层进行分区，以减少每个资源受限的边缘设备上的计算量和数据量。同时，需要考虑由此产生的通信开销。现有的有关分布式DNN执行的工作不能支持所有类型的网络和层，或者不能解决减少通信的层融合机会。在本文中，我们共同优化了内存，计算和通信需求，以实现覆盖所有层的完整神经网络的分布式执行。这是通过将特征和权重划分与通信感知层融合方法相结合以实现跨层整体优化的技术来实现的。对于给定数量的边缘设备，将这些方案联合应用，以使要在设备之间交换的数据量最小化，以优化运行时间。在运行YOLOv2 DNN模型的100 Mbit连接上对六个边缘设备进行仿真的实验结果表明，该方案均匀地平衡了设备之间的内存占用。层融合的集成还导致通信需求减少了14.8％。与没有融合的分区相比，这可以将推理任务的运行时间加快1.15倍。

著录项

来源
《International conference on embedded computer systems: architectures, modeling and simulation》|2019年|77-90|共14页
会议地点 Samos(GR)
作者
Rafael Stahl; Zhuoran Zhao; Daniel Mueller-Gritschneder; Andreas Gerstlauer; Ulf Schlichtmann;
展开▼
作者单位

Technical University of Munich Munich Germany;

University of Texas at Austin Austin USA;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Deep learning; Distributed computing; IoT;

机译：深度学习；分布式计算；物联网;

相似文献

外文文献
中文文献
专利

1. DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters [J] . Zhuoran Zhao, Kamyar Mirzazad Barijough, Andreas Gerstlauer IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2018,第11期

机译：DeepThings：在资源受限的IoT边缘集群上进行分布式自适应深度学习推理
2. DeeperThings:Fully Distributed CNN Inference on Resource-Constrained Edge Devices [J] . Rafael Stahl, Alexander Hoffman, Daniel Mueller-Gritschneder, International journal of parallel programming . 2021,第4期

机译：Deeperthings：资源受限边缘设备上的完全分布式CNN推断
3. Self-aware distributed deep learning framework for heterogeneous IoT edge devices [J] . Yi Jin, Jiawei Cai, Jiawei Xu, Future generation computer systems . 2021,第Deca期

机译：异构IOT边缘设备的自我意识分布式深度学习框架
4. Fully Distributed Deep Learning Inference on Resource-Constrained Edge Devices [C] . Rafael Stahl, Zhuoran Zhao, Daniel Mueller-Gritschneder, International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation . 2019

机译：资源受限边缘设备完全分布的深度学习推断
5. High-performance and Energy-efficient Deep Learning for Resource-constrained Devices [D] . Ren, Ao. 2020

机译：用于资源受限设备的高性能和节能深度学习
6. AI-doscopist: a real-time deep-learning-based algorithm for localising polyps in colonoscopy videos with edge computing devices [O] . Carmen C. Y. Poon, Yuqi Jiang, Ruikai Zhang, 2020

机译：AI-doscopist：一种基于实时深度学习的实时算法可通过边缘计算设备在结肠镜检查视频中定位息肉
7. DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices [O] . Rafael Stahl, Alexander Hoffman, Daniel Mueller-Gritschneder, 2021

机译：Deeperthings：资源受限边缘设备上的完全分布式CNN推断

Fully Distributed Deep Learning Inference on Resource-Constrained Edge Devices

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅