THE CONTINUUM-ARMED BANDIT PROBLEM

Agrawal R.

首页> 外文期刊>SIAM Journal on Control and Optimization >THE CONTINUUM-ARMED BANDIT PROBLEM

【24h】

THE CONTINUUM-ARMED BANDIT PROBLEM

机译：连续武装匪徒问题

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we consider the multiarmed bandit problem where the arms are chosen from a subset of the real line and the mean rewards are assumed to be a continuous function of the arms. The problem with an infinite number of arms is much more difficult than the usual one with a finite number of arms because the built-in learning task is now infinite dimensional. We devise a kernel estimator-based learning scheme for the mean reward as a function of the arms. Using this learning scheme, we construct a class of certainty equivalence control with forcing schemes and derive asymptotic upper bounds on their learning loss. To the best of our knowledge, these bounds are the strongest rates yet available. Moreover, they are stronger than the o(n) required for optimality with respect to the average-cost-per-unit-time criterion. [References: 28]

机译：在本文中，我们考虑了多臂强盗问题，其中从实线的子集中选择武器，并且假定平均报酬是武器的连续函数。武器数量无限的问题比武器数量有限的普通难题要困难得多，因为内置的学习任务现在是无限的。我们设计了一种基于核估计器的学习方案，将平均奖励作为手臂的函数。使用这种学习方案，我们用强迫方案构造了一类确定性等价控制，并推导了它们的学习损失的渐近上限。据我们所知，这些界限是迄今为止最强的比率。而且，相对于平均每单位时间成本标准，它们比优化所需的o（n）强。 [参考：28]

著录项

来源
《SIAM Journal on Control and Optimization》 |1995年第6期|共26页
作者
Agrawal R.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类应用数学;
关键词
Bandit problems; Controlled iid process; Stochastic adaptive control; Certainty equivalence with forcing; Learning loss; Continuous arms; Adaptive allocation rules; Global optimization; Regression; Diffusions; Space;

机译：强盗问题;受控的iid过程;随机自适应控制;强制性的等价性;学习损失;连续臂;自适应分配规则;全局优化;回归;扩散;空间;

相似文献

外文文献
中文文献
专利

1. THE CONTINUUM-ARMED BANDIT PROBLEM [J] . Agrawal R. SIAM Journal on Control and Optimization . 1995,第6期

机译：连续武装匪徒问题
2. Smoking and the bandit: A preliminary study of smoker and nonsmoker differences in exploratory behavior measured with a multiarmed bandit task. [J] . Merideth A Addicott, John M Pearson, Jessica Wilson, Experimental and clinical psychopharmacology . 2013,第1期

机译：吸烟与强盗：对吸烟者和非吸烟者探索性行为的差异的初步研究，该差异是通过多臂匪徒任务测得的。
3. Smoking and the bandit: A preliminary study of smoker and nonsmoker differences in exploratory behavior measured with a multiarmed bandit task. [J] . Merideth A Addicott, John M Pearson, Jessica Wilson, Experimental and clinical psychopharmacology . 2013,第1期

机译：吸烟和强盗：用多道主义匪徒任务测量的探索性行为的吸烟者和非主持人差异的初步研究。
4. General Robustness Evaluation of Incentive Mechanism against Bounded Rationality Using Continuum-Armed Bandits [C] . Zehong Hu, Jie Zhang, Zhao Li AAAI Conference on Artificial Intelligence . 2019

机译：连续武装匪徒对有界理性的激励机制的一般鲁棒性评价
5. Adaptive Preference Learning with Bandit Feedback: Information Filtering, Dueling Bandits and Incentivizing Exploration [D] . Chen, Bangrui. 2017

机译：带有土匪反馈的自适应偏好学习：信息过滤，决斗土匪和激励探索
6. Smoking and the bandit: A preliminary study of smoker and non-smoker differences in exploratory behavior measured with a multi-armed bandit task [O] . Merideth A. Addicott, John M. Pearson, Jessica Wilson, -1

机译：吸烟和强盗：用多武装强盗任务测量的探索性行为的吸烟者和非吸烟者差异的初步研究
7. General Robustness Evaluation of Incentive Mechanism against Bounded Rationality Using Continuum-Armed Bandits [O] . Zehong Hu, Jie Zhang, Zhao Li 2019

机译：连续武装匪徒对有界理性的激励机制的一般鲁棒性评价

THE CONTINUUM-ARMED BANDIT PROBLEM

摘要

著录项

相似文献

相关主题

期刊订阅