...
首页> 外文期刊>Telecommunication Systems >Adaptive transmission scheduling over fading channels for energy-efficient cognitive radio networks by reinforcement learning
【24h】

Adaptive transmission scheduling over fading channels for energy-efficient cognitive radio networks by reinforcement learning

机译:通过强化学习的节能认知无线电网络在衰落信道上的自适应传输调度

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we address a cross-layer issue of long-term average utility maximization in energy-efficient cognitive radio networks supporting packetized data traffic under the constraint of collision rate with licensed users. Utility is determined by the number of packets transmitted successfully per consumed power and buffer occupancy. We formulate the problem by dynamic programming method namely constrained Markov decision process (CMDP). Reinforcement learning (RL) approach is employed to finding a near-optimal policy under undiscovered environment. The policy learned by RL can guide transmitter to access available channels and select proper transmission rate at the beginning of each frame for its long-term optimal goals. Some implement problems of the RL approach are discussed. Firstly, state space compaction is utilized to cope with so-called curse of dimensionality due to large state space of formulated CMDP. Secondly, action set reduction is presented to reduce the number of actions for some system states. Finally, the CMDP is converted to a corresponding unconstrained Markov decision process (UMDP) by Lagrangian multiplier approach and a golden section search method is proposed to find the proper multiplier. In order to evaluate the performance of the policy learned by RL, we present two naive policies and compare them by simulations.
机译:在本文中,我们解决了在与许可用户发生冲突的情况下,支持分组数据流量的高能效认知无线电网络中长期平均效用最大化的跨层问题。效用取决于每消耗的功率和缓冲区占用情况下成功传输的数据包数量。我们通过动态规划方法即约束马尔可夫决策过程(CMDP)来表达问题。强化学习(RL)方法用于在未发现的环境下找到接近最佳的策略。 RL学习到的策略可以指导发射机访问可用信道,并在每个帧的开始为其长期最佳目标选择合适的传输速率。讨论了RL方法的一些实现问题。首先,由于配制的CMDP的大状态空间,状态空间压缩被用于应对所谓的尺寸诅咒。其次,提出了减少动作集的方法,以减少某些系统状态的动作数量。最后,通过拉格朗日乘数法将CMDP转换为相应的无约束马尔可夫决策过程(UMDP),并提出了黄金分割搜索方法以找到合适的乘数。为了评估RL学习到的策略的性能,我们提出两种朴素的策略,并通过仿真进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号