Gambler's Ruin Bandit Problem

机译：赌徒的废墟土匪问题

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a new multi-armed bandit problem called the Gambler's Ruin Bandit Problem (GRBP). In the GRBP, the learner proceeds in a sequence of rounds, where each round is a Markov Decision Process (MDP) with two actions (arms): a continuation action that moves the learner randomly over the state space around the current state; and a terminal action that moves the learner directly into one of the two terminal states (goal and dead-end state). The current round ends when a terminal state is reached, and the learner incurs a positive reward only when the goal state is reached. The objective of the learner is to maximize its long-term reward (expected number of times the goal state is reached), without having any prior knowledge on the state transition probabilities. We first prove a result on the form of the optimal policy for the GRBP. Then, we define the regret of the learner with respect to an omnipotent oracle, which acts optimally in each round, and prove that it increases logarithmically over rounds. We also identify a condition under which the learner's regret is bounded. A potential application of the GRBP is optimal medical treatment assignment, in which the continuation action corresponds to a conservative treatment and the terminal action corresponds to a risky treatment such as surgery.

机译：在本文中，我们提出了一个新的多武装匪徒问题，称为赌徒废墟匪徒问题（GRBP）。在GRBP中，学习者按一系列回合进行，其中每个回合都是具有两个动作（手臂）的马尔可夫决策过程（MDP）：一种持续动作，使学习者在当前状态周围的状态空间内随机移动;以及将学习者直接移动到两个最终状态（目标和最终状态）之一的最终动作。当前回合在达到最终状态时结束，学习者仅在达到目标状态时才获得正奖励。学习者的目标是最大化其长期奖励（达到目标状态的预期次数），而无需任何有关状态转换概率的先验知识。我们首先以GRBP最优策略的形式证明结果。然后，我们定义了学习者对于全能神谕的遗憾，该神谕在每一回合中均表现最佳，并证明其在每一回合中均呈对数增长。我们还确定了限制学习者后悔的条件。 GRBP的潜在应用是最佳的医学治疗分配，其中持续作用对应于保守治疗，而终末作用对应于危险的治疗，例如手术。

著录项

来源
《Annual Allerton Conference on Communication, Control, and Computing》|2016年|1236-1243|共8页
会议地点
作者
Nima Akbarzadeh; Cem Tekin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Markov processes; Algorithm design and analysis; Medical treatment; Uncertainty; Indexes;

机译：马尔可夫过程;算法设计与分析;药物治疗;不确定性;指标;

相似文献

外文文献
中文文献
专利

1. Gambling in a rigged casino: The adversarial multi-armed bandit problem [J] . Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, Electronic Colloquium on Computational Complexity . 2000,第131期

机译：在操纵的赌场中赌博：对抗性多武装匪徒问题
2. Deciding when to quit the gambler's ruin game with unknown probabilities [J] . Filipo Studzinski Perotto, Imen Trabelsi, Stéphanie Combettes, International Journal of Approximate Reasoning . 2021,第Octa期

机译：决定何时退出赌徒的遗址与未知概率
3. Gambler’s Risk of Ruin and Optimal Bet [J] . Robert Kay Ankomah, Richard Oduro, Emmanuel Kojo Amoah Communications in Mathematical Finance . 2020,第1期

机译：赌徒的破坏风险和最佳赌注
4. Gambler's Ruin Bandit Problem [C] . Nima Akbarzadeh, Cem Tekin Annual Allerton Conference on Communication, Control, and Computing . 2016

机译：赌徒的毁匪问题
5. Change -point detection of two -sided alternatives in the Brownian motion model and its connection to the gambler's ruin problem with relative wealth perception [D] . Hadjiliadis, Olympia. 2005

机译：布朗运动模型中双向选择的变化点检测及其与相对财富感知的赌徒破产问题的联系
6. A modified gamblers ruin model of polyethylene chains in the amorphous region. [O] . Z H Duan, L N Howard 1996

机译：非晶区中聚乙烯链的改良赌徒废墟模型。
7. Gambler's Ruin Bandit Problem [O] . Akbarzadeh, Nima, Tekin, Cem 2016

机译：赌徒的废墟强盗问题

Gambler's Ruin Bandit Problem

摘要

著录项

相似文献

相关主题

期刊订阅