首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters
【24h】

DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters

机译:DeepThings:在资源受限的IoT边缘集群上进行分布式自适应深度学习推理

获取原文
获取原文并翻译 | 示例

摘要

Edge computing has emerged as a trend to improve scalability, overhead, and privacy by processing large-scale data, e.g., in deep learning applications locally at the source. In IoT networks, edge devices are characterized by tight resource constraints and often dynamic nature of data sources, where existing approaches for deploying Deep/Convolutional Neural Networks (DNNs/CNNs) can only meet IoT constraints when severely reducing accuracy or using a static distribution that cannot adapt to dynamic IoT environments. In this paper, we propose DeepThings, a framework for adaptively distributed execution of CNN-based inference applications on tightly resource-constrained IoT edge clusters. DeepThings employs a scalable Fused Tile Partitioning (FTP) of convolutional layers to minimize memory footprint while exposing parallelism. It further realizes a distributed work stealing approach to enable dynamic workload distribution and balancing at inference runtime. Finally, we employ a novel work scheduling process to improve data reuse and reduce overall execution latency. Results show that our proposed FTP method can reduce memory footprint by more than 68% without sacrificing accuracy. Furthermore, compared to existing work sharing methods, our distributed work stealing and work scheduling improve throughput by 1.7 × -2.2× with multiple dynamic data sources. When combined, DeepThings provides scalable CNN inference speedups of 1.7×-3.5× on 2-6 edge devices with less than 23 MB memory each.
机译:通过在例如源头本地的深度学习应用中处理大规模数据,边缘计算已成为提高可扩展性,开销和隐私性的趋势。在物联网网络中,边缘设备的特点是资源紧张且经常具有数据源的动态特性,其中部署深度/卷积神经网络(DNN / CNN)的现有方法仅在严重降低准确性或使用静态分配时会满足物联网的约束。无法适应动态物联网环境。在本文中,我们提出了DeepThings,这是一个用于在资源紧张的IoT边缘集群上自适应分布地执行基于CNN的推理应用程序的框架。 DeepThings采用卷积层的可伸缩融合切片分区(FTP),以最大程度减少内存占用量,同时提供并行性。它还实现了一种分布式工作窃取方法,以在推理运行时实现动态工作负载分配和平衡。最后,我们采用了新颖的工作计划流程,以提高数据重用性并减少总体执行延迟。结果表明,我们提出的FTP方法可以在不牺牲精度的情况下将内存占用减少68%以上。此外,与现有的工作共享方法相比,我们的分布式工作窃取和工作计划通过多个动态数据源将吞吐量提高了1.7×-2.2×。结合使用时,DeepThings在2-6个边缘设备(每个内存少于23 MB)上可提供1.7倍-3.5倍的可扩展CNN推理速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号