【24h】

DNN Placement and Inference in Edge Computing

机译:边缘计算中的DNN放置和推理

获取原文
获取外文期刊封面目录资料

摘要

The deployment of deep neural network (DNN) models in software applications is increasing rapidly with the exponential growth of artificial intelligence. Currently, such models are deployed manually by developers in the cloud considering several user requirements, while the decision of model selection and user assignment is difficult to take. With the rise of edge computing paradigm, companies tend to deploy applications as close as possible to the user. Considering this system, the problem of DNN model selection and the inference serving becomes harder due to the introduction of communication latency between nodes. We present an automatic method for DNN placement and inference in edge computing; a mathematical formulation to the DNN Model Variant Selection and Placement (MVSP) problem is presented, it considers the inference latency of different model-variants, communication latency between nodes, and utilization cost of edge computing nodes. Furthermore, we propose a general heuristic algorithm to solve the MVSP problem. We provide an analysis of the effects of hardware sharing on inference latency, on an example of GPU edge computing nodes shared between different DNN model-variants. We evaluate our model numerically, and show the potentials of GPU sharing, with decreased average latency by 33% of millisecond-scale per request for low load, and by 21% for high load. We study the tradeoff between latency and cost and show the pareto optimal curves. Finally, we compare the optimal solution with the proposed heuristic and showed that the average latency per request increased by more than 60%. This can be improved using more efficient placement algorithms.
机译:随着人工智能的指数级增长,深层神经网络(DNN)模型在软件应用程序中的部署正在迅速增加。当前,考虑到多个用户需求,开发人员在云中手动部署此类模型,而很难做出模型选择和用户分配的决定。随着边缘计算范式的兴起,公司倾向于将应用程序部署在尽可能靠近用户的位置。考虑到该系统,由于节点之间的通信等待时间的引入,DNN模型选择和推理服务的问题变得更加困难。我们提出了一种在边缘计算中用于DNN放置和推理的自动方法;提出了DNN模型变量选择和放置(MVSP)问题的数学公式,其中考虑了不同模型变量的推理延迟,节点之间的通信延迟以及边缘计算节点的使用成本。此外,我们提出了一种通用的启发式算法来解决MVSP问题。我们以不同DNN模型变量之间共享的GPU边缘计算节点为例,分析了硬件共享对推理延迟的影响。我们通过数字方式评估我们的模型,并显示了GPU共享的潜力,对于低负载,平均延迟减少了每个请求毫秒级的33%,对于高负载则减少了21%。我们研究了延迟和成本之间的折衷,并显示了最优曲线。最后,我们将最佳解决方案与提议的启发式方法进行了比较,结果表明,每个请求的平均延迟增加了60%以上。使用更有效的放置算法可以改善这一点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号