${Q}$-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control

Dejan V. Djonin; Vikram Krishnamurthy

首页> 外文期刊>IEEE Transactions on Signal Processing >${Q}$-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control

【24h】

${Q}$-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control

机译：带有随机单调策略的约束Markov决策过程的$ {Q} $-学习算法：在MIMO传输控制中的应用

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper presents novel ${Q}$ -learning based stochastic control algorithms for rate and power control in V-BLAST transmission systems. The algorithms exploit the supermodularity and monotonic structure results derived in the companion paper. Rate and power control problem is posed as a stochastic optimization problem with the goal of minimizing the average transmission power under the constraint on the average delay that can be interpreted as the quality of service requirement of a given application. Standard ${Q}$-learning algorithm is modified to handle the constraints so that it can adaptively learn structured optimal policy for unknown channel/traffic statistics. We discuss the convergence of the proposed algorithms and explore their properties in simulations. To address the issue of unknown transmission costs in an unknown time-varying environment, we propose the variant of ${Q}$-learning algorithm in which power costs are estimated in online fashion, and we show that this algorithm converges to the optimal solution as long as the power cost estimates are asymptotically unbiased.

机译：本文提出了一种新颖的基于$ {Q} $学习的随机控制算法，用于V-BLAST传输系统中的速率和功率控制。该算法利用了伴随论文中得出的超模量和单调结构结果。速率和功率控制问题是一种随机优化问题，目的是在平均延迟约束下将平均传输功率降到最低，该平均延迟可以解释为给定应用程序的服务质量要求。修改了标准$ {Q} $学习算法以处理约束，以便它可以针对未知的渠道/流量统计信息自适应地学习结构化的最优策略。我们讨论了所提出算法的收敛性，并在仿真中探索了它们的性质。为了解决未知时变环境中传输成本未知的问题，我们提出了一种$ {Q} $学习算法的变体，该算法以在线方式估算电力成本，并且证明了该算法收敛于最优解只要电力成本估算是渐近无偏的。

著录项

来源
《IEEE Transactions on Signal Processing》 |2007年第2007期|p.2170-2181|共12页
作者
Dejan V. Djonin; Vikram Krishnamurthy;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类通信理论;计量学;
关键词
${Q}$ learning; Constrained Markov decision process (CMDP); V-BLAST; delay constraints; monotone policies; randomized policies; reinforcement learning; supermodularity; transmission scheduling;

机译：$ {Q} $学习;约束马尔可夫决策过程（CMDP）;V-BLAST;延迟约束;单调策略;随机策略;强化学习;超模块化;传输调度;

相似文献

外文文献
中文文献
专利

1. Non-randomized policies for constrained Markov decision processes [J] . Richard C. Chen, Eugene A. Feinberg Mathematical Methods of Operations Research . 2007,第1期

机译：约束马尔可夫决策过程的非随机策略
2. Random search for constrained Markov decision processes with multi-policy improvement [J] . Chang Hyeong Soo Automatica . 2015,第Null期

机译：随机搜索约束多策略改进的马尔可夫决策过程
3. An Evolutionary Random Policy Search Algorithm for Solving Markov Decision Processes [J] . Jiaqiao Hu, Michael C. Fu, Vahid R. Ramezani, INFORMS journal on computing . 2007,第2期

机译：一种求解马尔可夫决策过程的进化随机策略搜索算法
4. On Optimality of Monotone Channel-Aware Transmission Policies: A Constrained Markov Decision Process Approach [C] . Minh Hanh Ngo, Krishnamurthy, V. . 2007

机译：单调信道感知传输策略的最优性：约束马尔可夫决策过程
5. A New Reinforcement Learning Algorithm with Fixed Exploration for Semi-Markov Decision Processes [D] . Encapera, Angelo Michael. 2017

机译：半马尔可夫决策过程的固定探索新强化学习算法
6. A hybrid conjugate gradient algorithm for constrained monotone equations with application in compressive sensing [O] . Abdulkarim Hassan Ibrahim, Poom Kumam, Auwal Bala Abubakar, 2020

机译：约束单调方程的混合共轭梯度算法及其在压缩感知中的应用
7. Q-learning algorithms for constrained Markov decision processes with randomized monotone policies: Application to MIMO transmission control [O] . Dejan V. Djonin, Vikram Krishnamurthy 2007

机译：带有随机单调策略的约束Markov决策过程的Q学习算法：在MIMO传输控制中的应用

${Q}$-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅