首页> 外文期刊>IEEE Journal on Selected Areas in Communications >Neural Combinatorial Deep Reinforcement Learning for Age-Optimal Joint Trajectory and Scheduling Design in UAV-Assisted Networks
【24h】

Neural Combinatorial Deep Reinforcement Learning for Age-Optimal Joint Trajectory and Scheduling Design in UAV-Assisted Networks

机译:无人机辅助网络中年龄最优关节轨迹和调度设计的神经组合深度加固学习

获取原文
获取原文并翻译 | 示例
           

摘要

In this article, an unmanned aerial vehicle (UAV)-assisted wireless network is considered in which a battery-constrained UAV is assumed to move towards energy-constrained ground nodes to receive status updates about their observed processes. The UAV's flight trajectory and scheduling of status updates are jointly optimized with the objective of minimizing the normalized weighted sum of Age of Information (NWAoI) values for different physical processes at the UAV. The problem is first formulated as a mixed-integer program. Then, for a given scheduling policy, a convex optimization-based solution is proposed to derive the UAV's optimal flight trajectory and time instants on updates. However, finding the optimal scheduling policy is challenging due to the combinatorial nature of the formulated problem. Therefore, to complement the proposed convex optimization-based solution, a finite-horizon Markov decision process (MDP) is used to find the optimal scheduling policy. Since the state space of the MDP is extremely large, a novel neural combinatorial-based deep reinforcement learning (NCRL) algorithm using deep Q-network (DQN) is proposed to obtain the optimal policy. However, for large-scale scenarios with numerous nodes, the DQN architecture cannot efficiently learn the optimal scheduling policy anymore. Motivated by this, a long short-term memory (LSTM)-based autoencoder is proposed to map the state space to a fixed-size vector representation in such large-scale scenarios while capturing the spatio-temporal interdependence between the update locations and time instants. A lower bound on the minimum NWAoI is analytically derived which provides system design guidelines on the appropriate choice of importance weights for different nodes. Furthermore, an upper bound on the UAV's minimum speed is obtained to achieve this lower bound value. The numerical results also demonstrate that the proposed NCRL approach can significantly improve the achievable NWAoI per process compared to the baseline policies, such as weight-based and discretized state DQN policies.
机译:在本文中,考虑了一种无人驾驶飞行器(UAV)译本无线网络,其中假设电池约束的UAV移动朝向能量受限地节点移动,以接收关于其观察到的过程的状态更新。 UAV的航班轨迹和状态更新的调度是联合优化的,目的是最大限度地减少UAV在外的不同物理过程的信息的标准化加权之和(NWAOI)值。问题首先将其标志为混合整数程序。然后,对于给定的调度策略,提出了一种基于凸优化的解决方案,用于导出UV的最佳飞行轨迹和时间瞬间。然而,由于配制问题的组合性质,寻找最佳调度政策是挑战性的。因此,为了补充所提出的基于凸优化的解决方案,使用有限地平线马尔可夫决策过程(MDP)来查找最佳调度策略。由于MDP的状态空间非常大,提出了一种使用深Q-Network(DQN)的新型神经组合基础的深增强学习(NCRL)算法以获得最佳策略。但是,对于具有众多节点的大型方案,DQN架构无法再学习最佳调度策略。由此引进,提出了长期内存(LSTM)基础的AutoEncoder以在这种大型场景中将状态空间映射到固定大小的矢量表示,同时捕获更新位置和时间瞬间之间的时空相互依赖性。 A lower bound on the minimum NWAoI is analytically derived which provides system design guidelines on the appropriate choice of importance weights for different nodes.此外,获得了UAV的最小速度的上限以实现该下限值。数值结果还证明,与基线政策相比,所提出的NCRL方法可以显着改善可实现的NWAOI,例如基于重量和离散化状态DQN策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号