首页> 外文期刊>IEEE Transactions on Signal Processing >${Q}$-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control
【24h】

${Q}$-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control

机译:带有随机单调策略的约束Markov决策过程的$ {Q} $-学习算法:在MIMO传输控制中的应用

获取原文
获取原文并翻译 | 示例

摘要

This paper presents novel ${Q}$ -learning based stochastic control algorithms for rate and power control in V-BLAST transmission systems. The algorithms exploit the supermodularity and monotonic structure results derived in the companion paper. Rate and power control problem is posed as a stochastic optimization problem with the goal of minimizing the average transmission power under the constraint on the average delay that can be interpreted as the quality of service requirement of a given application. Standard ${Q}$-learning algorithm is modified to handle the constraints so that it can adaptively learn structured optimal policy for unknown channel/traffic statistics. We discuss the convergence of the proposed algorithms and explore their properties in simulations. To address the issue of unknown transmission costs in an unknown time-varying environment, we propose the variant of ${Q}$-learning algorithm in which power costs are estimated in online fashion, and we show that this algorithm converges to the optimal solution as long as the power cost estimates are asymptotically unbiased.
机译:本文提出了一种新颖的基于$ {Q} $学习的随机控制算法,用于V-BLAST传输系统中的速率和功率控制。该算法利用了伴随论文中得出的超模量和单调结构结果。速率和功率控制问题是一种随机优化问题,目的是在平均延迟约束下将平均传输功率降到最低,该平均延迟可以解释为给定应用程序的服务质量要求。修改了标准$ {Q} $学习算法以处理约束,以便它可以针对未知的渠道/流量统计信息自适应地学习结构化的最优策略。我们讨论了所提出算法的收敛性,并在仿真中探索了它们的性质。为了解决未知时变环境中传输成本未知的问题,我们提出了一种$ {Q} $学习算法的变体,该算法以在线方式估算电力成本,并且证明了该算法收敛于最优解只要电力成本估算是渐近无偏的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号