首页> 外文期刊>IEEE transactions on wireless communications >The Application of Deep Reinforcement Learning to Distributed Spectrum Access in Dynamic Heterogeneous Environments With Partial Observations
【24h】

The Application of Deep Reinforcement Learning to Distributed Spectrum Access in Dynamic Heterogeneous Environments With Partial Observations

机译:深度加强学习在局部观测中的动态异构环境中分布式频谱接入的应用

获取原文
获取原文并翻译 | 示例

摘要

This papera (1) investigates deep reinforcement learning (DRL) based on a Recurrent Neural Network (RNN) for Dynamic Spectrum Access (DSA) under partial observations, referred to as a Deep Recurrent Q-Network (DRQN). Specifically, we consider a scenario with multiple independent channels and multiple heterogeneous Primary Users (PUs). Two key challenges in our problem formulation are that we assume our DRQN node does not have any prior knowledge of the other nodes' behavior patterns and attempts to predict the future channel state based on previous observations. The goal of the DRQN is to learn a channel access strategy with a low collision rate but a high channel utilization rate. With proper definitions of the state, action and rewards, our extensive simulation results show that a DRQN-based approach can handle a variety of communication environments including dynamic environments. Further, our results show that the DRQN node is also able to cope with multi-rate and multi-agent scenarios. Importantly, we show the following benefits of using recurrent neural networks in DSA: (i) the ability to learn the optimal strategy in different environments under partial observations; (ii) robustness to imperfect observations and (iii) the ability to utilize multiple channels, and (iv) robustness in the presence of multiple agents. (1) A parton of this work was presented at MILCOM 2018 in [1].
机译:本文(1)根据局部观察下的动态频谱接入(DSA)的经常性神经网络(RNN)来研究深度加强学习(DRL),称为深复发性Q网络(DRQN)。具体来说,我们考虑一个具有多个独立通道和多个异构主用户(PU)的场景。我们的问题制定中的两个关键挑战是我们假设我们的DRQN节点不具有其他节点的行为模式的任何先前知识,并且尝试基于先前观察预测未来信道状态。 DRQN的目标是学习具有低碰撞速率但高信道利用率的频道访问策略。具有正确定义状态,行动和奖励,我们的广泛仿真结果表明,基于DRQN的方法可以处理包括动态环境的各种通信环境。此外,我们的结果表明,DRQN节点还能够应对多速率和多代理方案。重要的是,我们展示了在DSA中使用经常性神经网络的以下好处:(i)在部分观察下学习不同环境中的最佳策略的能力; (ii)对缺乏观察的鲁棒性和(iii)在多种药剂存在下利用多个通道的能力,(iv)鲁棒性。 (1)本工作的Parton在[1]中介绍了Milcom 2018。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号