...
首页> 外文期刊>International journal of parallel programming >DeeperThings:Fully Distributed CNN Inference on Resource-Constrained Edge Devices
【24h】

DeeperThings:Fully Distributed CNN Inference on Resource-Constrained Edge Devices

机译:Deeperthings:资源受限边缘设备上的完全分布式CNN推断

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Performing inference of Convolutional Neural Networks (CNNs) on Internet of Things (IoT) edge devices ensures both privacy of input data and possible run time reductions when compared to a cloud solution. As most edge devices are memory-and compute-constrained, they cannot store and execute complex CNNs. Partitioning and distributing layer information across multiple edge devices to reduce the amount of computation and data on each device presents a solution to this problem. In this article, we propose DeeperThings, an approach that supports a full distribution of CNN inference tasks by partitioning fully-connected as well as both feature- and weight-intensive convolutional layers. Additionally, we jointly optimize memory, computation and communication demands. This is achieved using techniques to combine both feature and weight partitioning with a communication-aware layer fusion method, enabling holistic optimization across layers. For a given number of edge devices, the schemes are applied jointly using Integer Linear Programming (ILP) formulations to minimize data exchanged between devices, to optimize run times and to find the entire model's minimal memory footprint. Experimental results from a real-world hardware setup running four different CNN models confirm that the scheme is able to evenly balance the memory footprint between devices. For six devices on 100 Mbit/s connections the integration of layer fusion additionally leads to a reduction of communication demands by up to 28.8%. This results in run time speed-up of the inference task by up to 1.52x compared to layer partitioning without fusing.
机译:在Internet Internet(IoT)边缘设备上执行卷积神经网络(CNNS)的推断确保了与云解决方案相比的输入数据的隐私和可能的运行时间减少。由于大多数边缘设备都是内存和计算约束,它们无法存储和执行复杂的CNN。跨多个边缘设备的分区和分配层信息以减少每个设备上的计算量和数据的数量对此问题提出了解决方案。在本文中,我们提出了深入的方法,一种方法,一种方法通过分区完全连接以及特征和重量密集型卷积层来支持CNN推理任务的全部分布。此外,我们共同优化内存,计算和通信需求。这是使用与通信感知层融合方法相结合的技术和权重分区的技术实现的,从而实现跨层的整体优化。对于给定数量的边缘设备,这些方案使用整数线性编程(ILP)配方共同应用,以最大限度地减少在设备之间交换的数据,以优化运行时间并找到整个模型的最小内存占用空间。运行四种不同的CNN模型的实际硬件设置的实验结果证实该方案能够平衡设备之间的内存占用空间。对于100 Mbit / s连接的六个设备,层融合的集成额外导致通信需求的降低高达28.8%。与没有融合的层分区相比,这导致推理任务的运行时间加速高达1.52倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号