ACTION TIME SHARING POLICIES FOR ERGODIC CONTROL OFMARKOV CHAINS

AMARJIT BUDHIRAJA; XIN LIU; ADAM SHWARTZ

首页> 外文期刊>SIAM Journal on Control and Optimization >ACTION TIME SHARING POLICIES FOR ERGODIC CONTROL OFMARKOV CHAINS

【24h】

ACTION TIME SHARING POLICIES FOR ERGODIC CONTROL OFMARKOV CHAINS

机译：马尔科夫链条的生物控制的分时行动政策

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Ergodic control for discrete time controlled Markov chains with a locally compact state space and a compact action space is considered under suitable stability, irreducibility, and Feller continuity conditions. A flexible family of controls, called action time sharing (ATS) policies,associated with a given continuous stationary Markov control, is introduced. It is shown that the long-term average cost for such a control policy, for a broad range of one-stage cost functions, is the same as that for the associated stationary Markov policy. In addition, ATS policies are well suited for a range of estimation, information collection, and adaptive control goals. To illustrate the possibilities we present two examples. The first demonstrates a construction of an ATS policy thatleads to consistent estimators for unknown model parameters while producing the desired long-term average cost value. The second example considers a setting where the target stationary Markov control q is not known but there are sampling schemes available that allow for consistent estimation of q. We construct an ATS policy which uses dynamic estimators for q for control decisions and show that the associated cost coincides with that for the unknown Markov control q.

机译：在适当的稳定性，不可约性和Feller连续性条件下，考虑了具有局部紧凑状态空间和紧凑动作空间的离散时间控制马尔可夫链的遍历控制。引入了一个灵活的控件系列，称为动作时间共享（ATS）策略，它与给定的连续固定马尔可夫控件关联。结果表明，这种控制策略的长期平均成本，针对广泛的一阶段成本函数，与相关的固定马尔可夫策略相同。此外，ATS策略非常适合一系列估计，信息收集和自适应控制目标。为了说明可能性，我们提供两个示例。第一个演示了ATS策略的构建，该策略可导致对未知模型参数进行一致的估计，同时产生所需的长期平均成本值。第二个示例考虑了一个设置，在该设置中目标静止Markov控制q未知，但是有可用的采样方案可以对q进行一致的估计。我们构造了一个ATS策略，该策略对q进行动态决策，以得出控制决策，并表明相关成本与未知Markov控制q的成本相符。

著录项

来源
《SIAM Journal on Control and Optimization》 |2012年第1期|共25页
作者
AMARJIT BUDHIRAJA; XIN LIU; ADAM SHWARTZ;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类运筹学;
关键词
Markov decision processes; controlled Markov processes; adaptive control; ergodiccontrol; action time sharing policies; long-time average cost;

机译：马尔可夫决策过程;受控马尔可夫过程;自适应控制;遍历控制;行动时间共享策略;长期平均成本;

相似文献

外文文献
中文文献
专利

1. ACTION TIME SHARING POLICIES FOR ERGODIC CONTROL OFMARKOV CHAINS [J] . AMARJIT BUDHIRAJA, XIN LIU, ADAM SHWARTZ SIAM Journal on Control and Optimization . 2012,第1期

机译：马尔科夫链条的生物控制的分时行动政策
2. Uniform ergodicity of continuous-time controlled Markov chains: A survey and new results [J] . Prieto-Rumeau Tomas, Hernandez-Lerma Onesimo Annals of Operations Research . 2016,第1a2期

机译：连续时间受控马尔可夫链的统一遍历性：一项调查和新结果
3. Approximating Ergodic Average Reward Continuous-Time Controlled Markov Chains [J] . Prieto-Rumeau T., Lorenzo J. M. Automatic Control, IEEE Transactions on . 2010,第1期

机译：近似遍历平均奖励连续时间控制的马尔可夫链
4. Ergodic Control of Continuous-Time Markov Chains with Pathwise Constraints [C] . Tomas Prieto-Rumeau, Onesimo Hernandez-Lerma IEEE Conference on Decision and Control . 2009

机译：具有PathWise约束的连续时间马尔可夫链的ergodic控制
5. CHARACTERIZATIONS OF STRONG ERGODICITY FOR CONTINUOUS TIME MARKOV CHAINS. [D] . SCOTT, MARK. 1979

机译：连续时间马尔可夫链的强电性特征。
6. Identification of Endogenous HLA-A2–Restricted Reactivity Against Shared Melanoma Antigens in Patients Using the Quantitative Real-Time Polymerase Chain Reaction [O] . Stacy E. Thurber, Hung T. Khong, Udai S. Kammula, -1

机译：使用定量实时聚合酶链反应鉴定患者内源性HLA-A2限制的针对共享黑色素瘤抗原的反应性
7. ACTION TIME SHARING POLICIES FOR ERGODIC CONTROL OF MARKOV CHAINS ∗ [O] . Amarjit Budhiraja, Xin Liu, Adam Shwartz 2013

机译：用于马尔科夫链的病态控制的行动时间共享政策*
8. Relation between Recurrence and Ergodicity Properties in Denumerable Markov Decision Chains [R] . Dekker, R., Hordijk, A., Spieksma, F. 1991

机译：可数马尔可夫决策链中递推与遍历性的关系

ACTION TIME SHARING POLICIES FOR ERGODIC CONTROL OFMARKOV CHAINS

摘要

著录项

相似文献

相关主题

期刊订阅