Accelerated -Greedy Multi Armed Bandit Algorithm for Online Sequential-Selections Applications

Khosrow Amirizadeh*; Rajeswari Mandava

首页> 外文期刊>Journal of software >Accelerated -Greedy Multi Armed Bandit Algorithm for Online Sequential-Selections Applications

【24h】

Accelerated -Greedy Multi Armed Bandit Algorithm for Online Sequential-Selections Applications

机译：在线顺序选择应用中的加速贪婪多武装强盗算法

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Current algorithms for solving multi-armed bandit (MAB) problem in stationary observations often perform well. Although this performance may be acceptable with accurate parameter settings, most of them degrade under non stationary observations. We setup an incremental ε-greedy model with stochastic mean equation as its action-value function which is more applicable to real-world problems. Unlike the iterative algorithms suffering from step size dependency, we propose an adaptive step-size model (ASM) to introduce adaptive MAB algorithm. The proposed model employs ε-greedy approach as action selection policy. In addition, a dynamic exploration parameter ε is introduced to be ineffective by increasing decision maker’s intelligence. The proposed model is empirically evaluated and compared with existing algorithms including the standard ε-greedy, Softmax, ε-decreasing and UCB-Tuned models under stationary as well as non stationary situations. ASM not only addresses concerns in parameter dependency problem but also performs either comparable or better than mentioned algorithms. Applying these enhancements to the standard ε-greedy reduce the learning time which is more attractive to the wide range of on-line sequential selection-based applications such as autonomous agents, adaptive control, industrial robots and forecasting trend problems in management and economics domains.

机译：解决固定观测中的多臂匪（MAB）问题的当前算法通常表现良好。尽管此性能对于准确的参数设置来说是可以接受的，但是它们中的大多数在非平稳观测下会降低。我们建立了一个带有随机均值方程作为其作用值函数的增量ε贪婪模型，该模型更适用于实际问题。与受步长依赖性影响的迭代算法不同，我们提出了一种自适应步长模型（ASM）来介绍自适应MAB算法。该模型采用ε-贪婪方法作为行动选择策略。此外，动态决策参数ε会因增加决策者的智能而无效。经验模型对提出的模型进行了评估，并与现有算法（包括标准ε贪心，Softmax，ε递减和UCB调整模型）在固定和非固定情况下进行了比较。 ASM不仅解决了参数依赖性问题，而且执行了与上述算法相当或更好的算法。将这些增强功能应用于标准ε贪心可减少学习时间，这对于基于在线顺序选择的各种应用（例如自治代理，自适应控制，工业机器人以及预测管理和经济学领域的趋势问题）更具吸引力。

著录项

来源
《Journal of software 》 |2015年第3期| 共11页
作者
Khosrow Amirizadeh*; Rajeswari Mandava;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术 ;
关键词

相似文献

外文文献
中文文献
专利

1. Improving multi-armed bandit algorithms in online pricing settings [J] . Trovo Francesco, Paladino Stefano, Restelli Marcello, 高分子論文集 . 2018 ,第JULa期

机译：在在线定价环境中改进多臂强盗算法
2. Optimality of greedy policy for a class of standard reward function of restless multi-armed bandit problem [J] . Wang K., Liu Q., Chen L. Signal Processing, IET . 2012 ,第6期

机译：一类不安定多武装匪徒问题标准奖励函数的贪心策略的最优性
3. Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game [J] . Ole-Christoffer Granmo, Sondre Glimsdal Applied Intelligence . 2013 ,第4期

机译：基于去中心化两臂强盗决策的加速贝叶斯学习及其在Goore游戏中的应用
4. The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits [C] . John Langford, Tong Zhang Annual Conference on Neural Information Processing Systems . 2007

机译：上下文多臂强盗的时代贪心算法
5. Offline Evaluation of Multi-Armed Bandit Algorithms Using Bootstrapped Replay on Expanded Data [D] . Dai, Jin. 2021

机译：在扩展数据上使用引导重播的多武装强盗算法的离线评估
6. Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm [O] . Emanuele Cavenaghi, Gabriele Sottocornola, Fabio Stella, 2021

机译：非固定多武装强盗：新概念漂移感知算法的实证评估
7. Accelerated -Greedy Multi Armed Bandit Algorithm for Online Sequential-Selections Applications [O] . Khosrow Amirizadeh, Rajeswari Mandava 2015

机译：加速-GreeDy多武装强盗算法用于在线顺序选择应用

Accelerated -Greedy Multi Armed Bandit Algorithm for Online Sequential-Selections Applications

摘要

著录项

相似文献

相关主题

期刊订阅