首页> 外文会议>IEEE Conference on Decision and Control >Adversarial Multi-Armed Bandit Approach to Two-Person Zero-Sum Markov Games

【24h】

Adversarial Multi-Armed Bandit Approach to Two-Person Zero-Sum Markov Games

机译：对抗两人零和马尔可夫游戏的对抗多武装强盗方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A sampling-based algorithm for solving stochastic optimization problems, based on Auer et al.'s Exp3 algorithm for "adversarial multi-armed bandit problems," has been recently presented by the authors. In particular, the authors recursively extended the Exp3-based algorithm for solving finite-horizon Markov decision processes (MDPs) and analyzed its finite-iteration performance in terms of the expected bias relative to the maximum value of the "recursive sample-average-approximation (SAA)" problem induced by the sampling process in the algorithm, showing that the upper bound of the expected bias approaches zero as the sampling size per state sampled in each stage goes to infinity, leading to the convergence to the optimal value of the original MDP problem in the limit. As a sequel to the previous work, the idea is further extended for solving two-person zero-sum Markov games (MGs), providing a finite-iteration bound to the equilibrium value of the induced "recursive SAA game" problem and asymptotic convergence to the true equilibrium value. The recursively extended algorithm for MGs can be used for breaking the curse of dimensionality.

机译：一种基于Auer等人的解决随机优化问题的采样算法。作者最近介绍了“对抗多武装匪徒问题”的Exp3算法。特别是，作者递归地扩展了基于Exp3的算法来解决有限地平线马尔可夫决策过程（MDP），并在相对于“递归样本 - 平均近似值的最大值”中的预期偏差方面分析其有限迭代性能（SAA）“由算法中的采样过程引起的问题，显示预期偏置的上限作为每个阶段中采样的每个状态的采样尺寸转到无穷大，导致原始的最佳值的收敛MDP问题在限制。作为先前工作的续集，该想法进一步扩展了解决双人零和马尔可夫游戏（MGS），提供与诱导的“递归SAA游戏”问题和渐近收敛的均衡值绑定的有限迭代。真正的均衡价值。 MGS的递归扩展算法可用于破坏维度的诅咒。

著录项

来源
《IEEE Conference on Decision and Control 》|2007年||共6页
会议地点
作者
Hyeong Soo Chang; Michael C. Fu; Steven I. Marcus;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP273-53;
关键词
Markov game; Markov decision process; Sample average approximation; Sampling;

机译：马尔可夫游戏;马尔可夫决策过程;样本平均近似;抽样;

相似文献

外文文献
中文文献
专利

1. Adaptive Adversarial Multi-Armed Bandit Approach to Two-Person Zero-Sum Markov Games [J] . Hyeong Soo Chang, Jiaqiao Hu, Fu M.C., Automatic Control, IEEE Transactions on . 2010 ,第2期

机译：两人零和马尔可夫博弈的自适应对抗多武器强盗方法
2. Two-person zero-sum Markov games: receding horizon approach [J] . Hyeong Soo Chang, Marcus S.I. IEEE Transactions on Automatic Control . 2003 ,第11期

机译：两人零和马尔可夫博弈：后退地平线方法
3. ON ZERO-SUM TWO-PERSON UNDISCOUNTED SEMI-MARKOV GAMES WITH A MULTICHAIN STRUCTURE [J] . Mondal Prasenjit Advances in applied probability . 2017 ,第3期

机译：在零和两人未招示的半马尔可夫游戏，具有彩色结构
4. Adversarial Multi-Armed Bandit Approach to Two-Person Zero-Sum Markov Games [C] . Hyeong Soo Chang, Michael C. Fu, Steven I. Marcus IEEE Conference on Decision and Control . 2007

机译：对抗两人零和马尔可夫游戏的对抗多武装强盗方法
5. Multi-Armed Bandits with Applications to Markov Decision Processes and Scheduling Problems. [D] . Muqattash, Isa M. 2014

机译：多臂土匪在马尔可夫决策过程和调度问题中的应用。
6. Smoking and the bandit: A preliminary study of smoker and non-smoker differences in exploratory behavior measured with a multi-armed bandit task [O] . Merideth A. Addicott, John M. Pearson, Jessica Wilson, -1

机译：吸烟和强盗：用多武装强盗任务测量的探索性行为的吸烟者和非吸烟者差异的初步研究
7. A two-person zero-sum Markov game with a stopped set [O] . Tanaka Kensuke, Lai Hang-Chin 1982

机译：两人制零和马尔可夫博弈游戏

Adversarial Multi-Armed Bandit Approach to Two-Person Zero-Sum Markov Games

摘要

著录项

相似文献

相关主题

期刊订阅