An estimation based allocation rule with super-linear regret and finite lock-on time for time-dependent multi-armed bandit processes

机译：基于估计的具有超线性后悔和有限锁定时间的分配规则，用于依赖时间的多臂匪徒程序

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The multi-armed bandit (MAB) problem has been an active area of research since the early 1930s. The majority of the literature restricts attention to i.i.d. or Markov reward processes. In this paper, the finite-parameter MAB problem with time-dependent reward processes is investigated. An upper confidence bound (UCB) based index policy, where the index is computed based on the maximum-likelihood estimate of the unknown parameter, is proposed. This policy locks on to the optimal arm in finite expected time but has a super-linear regret. As an example, the proposed index policy is used for minimizing prediction error when each arm is a auto-regressive moving average (ARMA) process.

机译：自1930年代初以来，多臂匪（MAB）问题一直是研究的一个活跃领域。大多数文献将注意力限制在i.d.或马尔可夫奖赏过程。本文研究了具有时变奖励过程的有限参数MAB问题。提出了一种基于上置信界（UCB）的索引策略，其中该索引是基于未知参数的最大似然估计来计算的。该策略会在有限的预期时间内锁定最佳臂，但会产生超线性的遗憾。例如，当每个手臂是自回归移动平均（ARMA）过程时，建议的索引策略用于最小化预测误差。

著录项

来源
《IEEE Canadian Conference on Electrical and Computer Engineering》|2015年|1299-1306|共8页
会议地点
作者
Prokopiou Prokopis C.; Caines Peter E.; Mahajan Aditya;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards [J] . Arya Sakshi, Yang Yuhong Statistics & Probability Letters . 2020,第1期

机译：随机分配与延迟奖励的上下文多武装匪徒的非参数分配
2. Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates [J] . Yang YH., Zhu D. The Annals of Statistics: An Official Journal of the Institute of Mathematical Statistics . 2002,第1期

机译：具有协变量的多臂匪问题的具有非参数估计的随机分配
3. Multi-armed Bandit processes with optimal selection of the operating times [J] . Pilar Ibarrola, Ricardo Vélez TEST . 2005,第1期

机译：多臂Bandit流程，可最佳选择运行时间
4. An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes [C] . Prokopis C. Prokopiou, Peter E. Caines, Aditya Mahajan IEEE Canadian Conference on Electrical and Computer Engineering . 2015

机译：基于估计基于分配规则，具有超线性遗憾和有限锁定时间，适用于时间依赖多武装匪盗过程
5. From Stability to Low-Regret Algorithms in Stochastic Multi-Armed Bandits [D] . Huang, Kuan-Sung. 2021

机译：从随机多武装匪中的低遗憾算法到低遗憾算法
6. Anytime Exploration for Multi-armed Bandits using ConfidenceInformation [O] . Kwang-Sung Jun, Robert Nowak -1

机译：随时随地探索多臂匪信息
7. Asymptotically efficient adaptive allocation rules for the multi-armed bandit problem with switching cost [O] . Rajeev Agrawal, Manjunath V, Demosthenis Teneketzis 1988

机译：具有切换成本的多臂强盗问题的渐近有效自适应分配规则

An estimation based allocation rule with super-linear regret and finite lock-on time for time-dependent multi-armed bandit processes

摘要

著录项

相似文献

相关主题

期刊订阅