首页> 外文期刊>IEEE Transactions on Cognitive Communications and Networking >Delay-Aware VNF Scheduling: A Reinforcement Learning Approach With Variable Action Set
【24h】

Delay-Aware VNF Scheduling: A Reinforcement Learning Approach With Variable Action Set

机译:延迟感知VNF调度:具有可变动作集的强化学习方法

获取原文
获取原文并翻译 | 示例
           

摘要

Software defined networking (SDN) and network function virtualization (NFV) are the key enabling technologies for service customization in next generation networks to support various applications. In such a circumstance, virtual network function (VNF) scheduling plays an essential role in enhancing resource utilization and achieving better quality-of-service (QoS). In this paper, the VNF scheduling problem is investigated to minimize the makespan (i.e., overall completion time) of all services, while satisfying their different end-to-end (E2E) delay requirements. The problem is formulated as a mixed integer linear program (MILP) which is NP-hard with exponentially increasing computational complexity as the network size expands. To solve the MILP with high efficiency and accuracy, the original problem is reformulated as a Markov decision process (MDP) problem with variable action set. Then, a reinforcement learning (RL) algorithm is developed to learn the best scheduling policy by continuously interacting with the network environment. The proposed learning algorithm determines the variable action set at each decision-making state and captures different execution time of the actions. The reward function in the proposed algorithm is carefully designed to realize delay-aware VNF scheduling. Simulation results are presented to demonstrate the convergence and high accuracy of the proposed approach against other benchmark algorithms.
机译:软件定义的网络(SDN)和网络功能虚拟化(NFV)是在下一代网络中服务自定义的关键支持技术,以支持各种应用程序。在这种情况下,虚拟网络功能(VNF)调度在提高资源利用率和实现更好的服务质量(QoS)方面起着重要作用。在本文中,研究了VNF调度问题,以最小化所有服务的Mapspan(即总体完成时间),同时满足其不同端到端(E2E)延迟要求。由于网络大小的扩展,该问题被标记为混合整数线性程序(MILP),其与指数增加计算复杂性。要以高效率和准确性解决MILP,原始问题将重新重新重新重新重新重新重新重新重新格式化为具有可变动作集的Markov决策过程(MDP)问题。然后,开发了一种增强学习(RL)算法以通过与网络环境连续交互来学习最佳调度策略。所提出的学习算法确定在每个决策状态下设置的可变动作,并捕获操作的不同执行时间。所提出的算法中的奖励功能经过精心设计,以实现延迟感知的VNF调度。提出了仿真结果以展示拟议方法对其他基准算法的收敛性和高精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号