DeeperThings:Fully Distributed CNN Inference on Resource-Constrained Edge Devices

Rafael Stahl; Alexander Hoffman; Daniel Mueller-Gritschneder; Andreas Gerstlauer; Ulf Schlichtmann

首页> 外文期刊>International journal of parallel programming >DeeperThings:Fully Distributed CNN Inference on Resource-Constrained Edge Devices

【24h】

DeeperThings:Fully Distributed CNN Inference on Resource-Constrained Edge Devices

机译：Deeperthings：资源受限边缘设备上的完全分布式CNN推断

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Performing inference of Convolutional Neural Networks (CNNs) on Internet of Things (IoT) edge devices ensures both privacy of input data and possible run time reductions when compared to a cloud solution. As most edge devices are memory-and compute-constrained, they cannot store and execute complex CNNs. Partitioning and distributing layer information across multiple edge devices to reduce the amount of computation and data on each device presents a solution to this problem. In this article, we propose DeeperThings, an approach that supports a full distribution of CNN inference tasks by partitioning fully-connected as well as both feature- and weight-intensive convolutional layers. Additionally, we jointly optimize memory, computation and communication demands. This is achieved using techniques to combine both feature and weight partitioning with a communication-aware layer fusion method, enabling holistic optimization across layers. For a given number of edge devices, the schemes are applied jointly using Integer Linear Programming (ILP) formulations to minimize data exchanged between devices, to optimize run times and to find the entire model's minimal memory footprint. Experimental results from a real-world hardware setup running four different CNN models confirm that the scheme is able to evenly balance the memory footprint between devices. For six devices on 100 Mbit/s connections the integration of layer fusion additionally leads to a reduction of communication demands by up to 28.8%. This results in run time speed-up of the inference task by up to 1.52x compared to layer partitioning without fusing.

机译：在Internet Internet（IoT）边缘设备上执行卷积神经网络（CNNS）的推断确保了与云解决方案相比的输入数据的隐私和可能的运行时间减少。由于大多数边缘设备都是内存和计算约束，它们无法存储和执行复杂的CNN。跨多个边缘设备的分区和分配层信息以减少每个设备上的计算量和数据的数量对此问题提出了解决方案。在本文中，我们提出了深入的方法，一种方法，一种方法通过分区完全连接以及特征和重量密集型卷积层来支持CNN推理任务的全部分布。此外，我们共同优化内存，计算和通信需求。这是使用与通信感知层融合方法相结合的技术和权重分区的技术实现的，从而实现跨层的整体优化。对于给定数量的边缘设备，这些方案使用整数线性编程（ILP）配方共同应用，以最大限度地减少在设备之间交换的数据，以优化运行时间并找到整个模型的最小内存占用空间。运行四种不同的CNN模型的实际硬件设置的实验结果证实该方案能够平衡设备之间的内存占用空间。对于100 Mbit / s连接的六个设备，层融合的集成额外导致通信需求的降低高达28.8％。与没有融合的层分区相比，这导致推理任务的运行时间加速高达1.52倍。

著录项

来源
《International journal of parallel programming》 |2021年第4期|600-624|共25页
作者
Rafael Stahl; Alexander Hoffman; Daniel Mueller-Gritschneder; Andreas Gerstlauer; Ulf Schlichtmann;
展开▼
作者单位

Technical University of Munich Munich Germany;

Technical University of Munich Munich Germany;

Technical University of Munich Munich Germany;

University of Texas at Austin Austin TX USA;

Technical University of Munich Munich Germany;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Deep learning; Distributed computing; IoT;

机译：深度学习;分布式计算;IOT.;

相似文献

外文文献
中文文献
专利

1. DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters [J] . Zhuoran Zhao, Kamyar Mirzazad Barijough, Andreas Gerstlauer IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2018,第11期

机译：DeepThings：在资源受限的IoT边缘集群上进行分布式自适应深度学习推理
2. Efficient CNN based summarization of surveillance videos for resource-constrained devices [J] . Pattern recognition letters . 2020,第Feba期

机译：基于有效CNN的资源受限设备监视视频摘要
3. ROCK-CNN: Distributed Deep Learning Computations in a Resource-Constrained Cluster [J] . Rezeda Khaydarova, Dmitriy Mouromtsev, Vladislav Fishchenko, International journal of embedded and real-time communication systems . 2021,第3期

机译：Rock-CNN：资源受限群集中的分布式深度学习计算
4. Fully Distributed Deep Learning Inference on Resource-Constrained Edge Devices [C] . Rafael Stahl, Zhuoran Zhao, Daniel Mueller-Gritschneder, International conference on embedded computer systems: architectures, modeling and simulation . 2019

机译：资源受限的边缘设备上的完全分布式深度学习推理
5. A Methodology for Synthesis of Resource-Scalable CNNs for Inference at the Edge [D] . Portillo, Felix Alberto. 2019

机译：用于在边缘推理的资源可伸缩CNN的方法的方法
6. Knowledge-Based Verification of Concatenative Programming Patterns Inspired by Natural Language for Resource-Constrained Embedded Devices [O] . Salvatore Gaglio, Giuseppe Lo Re, Gloria Martorella, 2021

机译：基于知识的基于知识编程模式的验证受到资源受限嵌入式设备的自然语言的启发
7. DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices [O] . Rafael Stahl, Alexander Hoffman, Daniel Mueller-Gritschneder, 2021

机译：Deeperthings：资源受限边缘设备上的完全分布式CNN推断

DeeperThings:Fully Distributed CNN Inference on Resource-Constrained Edge Devices

摘要

著录项

相似文献

相关主题

期刊订阅