首页> 外文期刊>IEEE transactions on wireless communications >Multi-Agent Reinforcement Learning-Based Resource Allocation for UAV Networks
【24h】

Multi-Agent Reinforcement Learning-Based Resource Allocation for UAV Networks

机译:基于多主体强化学习的无人机网络资源分配

获取原文
获取原文并翻译 | 示例
           

摘要

Unmanned aerial vehicles (UAVs) are capable of serving as aerial base stations (BSs) for providing both cost-effective and on-demand wireless communications. This article investigates dynamic resource allocation of multiple UAVs enabled communication networks with the goal of maximizing long-term rewards. More particularly, each UAV communicates with a ground user by automatically selecting its communicating user, power level and subchannel without any information exchange among UAVs. To model the dynamics and uncertainty in environments, we formulate the long-term resource allocation problem as a stochastic game for maximizing the expected rewards, where each UAV becomes a learning agent and each resource allocation solution corresponds to an action taken by the UAVs. Afterwards, we develop a multi-agent reinforcement learning (MARL) framework that each agent discovers its best strategy according to its local observations using learning. More specifically, we propose an agent-independent method, for which all agents conduct a decision algorithm independently but share a common structure based on Q-learning. Finally, simulation results reveal that: 1) appropriate parameters for exploitation and exploration are capable of enhancing the performance of the proposed MARL based resource allocation algorithm; 2) the proposed MARL algorithm provides acceptable performance compared to the case with complete information exchanges among UAVs. By doing so, it strikes a good tradeoff between performance gains and information exchange overheads.
机译:无人飞行器(UAV)能够用作空中基站(BS),以提供具有成本效益和按需的无线通信。本文研究了多个启用了UAV的通信网络的动态资源分配,目的是最大化长期回报。更具体地,每个UAV通过自动选择其通信用户,功率水平和子信道而与地面用户通信,而无需在UAV之间进行任何信息交换。为了对环境中的动态和不确定性进行建模,我们将长期资源分配问题公式化为用于最大化预期收益的随机博弈,其中每个无人机成为学习代理,每个资源分配解决方案对应于无人机采取的行动。之后,我们开发了一个多主体强化学习(MARL)框架,该框架可让每个主体根据使用学习的本地观察发现最佳策略。更具体地说,我们提出了一种与代理无关的方法,该方法可让所有代理独立执行决策算法,但共享基于Q学习的通用结构。最后,仿真结果表明:1)适当的开发参数可以提高基于MARL的资源分配算法的性能。 2)与无人机之间进行完整信息交换的情况相比,提出的MARL算法提供了可接受的性能。这样,它可以在性能提升和信息交换开销之间取得良好的平衡。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号