Comparing Multi-Armed Bandit Algorithms and Q-learning for Multiagent Action Selection: a Case Study in Route Choice

机译：多武装强盗算法和Q学习进行多武器动作选择：路径选择案例研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The multi-armed bandit (MAB) problem is concerned with an agent choosing which arm of a slot machine to play in order to optimize its reward. A family of reinforcement learning algorithms exists to tackle this problem, including a few variants that consider more than one agent (thus, characterizing a repeated game) and non-stationary variants. In this paper, we seek to evaluate the performance of some of these MAB algorithms and compare them with Q-learning when applied to a non-stationary repeated game, where commuter agents face the task of learning how to choose a route that minimizes their travel times.

机译：多武装强盗（MAB）问题涉及一种选择槽机的手臂以优化其奖励的代理。存在一系列强化学习算法来解决这个问题，包括几种考虑多个代理的变型（因此，表征了重复的游戏）和非静止变体。在本文中，我们寻求评估这些MAB算法中的一些的性能，并在应用于非静止的重复游戏时将它们与Q学习进行比较，通勤代理面临着学习如何选择最小化旅行的路线的任务时代。

著录项

来源
《International Joint Conference on Neural Networks》|2018年|1-705p|共8页
会议地点
作者
Thiago B. F. de Oliveira; Ana L. C. Bazzan; Bruno C. da Silva; Ricardo Grunitzki;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP183-53;
关键词
Vehicles; Learning (artificial intelligence); Games; Standards; Face; Task analysis; Markov processes;

机译：车辆;学习（人工智能）;游戏;标准;面部;任务分析;马尔可夫进程;

相似文献

外文文献
中文文献
专利

1. Dynamic channel selection in wireless communications via a multi-armed bandit algorithm using laser chaos time series [J] . Shungo Takeuchi, Mikio Hasegawa, Kazutaka Kanno, Scientific reports. . 2020,第1期

机译：使用激光混沌时间序列的多武装强盗算法在无线通信中的动态信道选择
2. QoS Driven Channel Selection Algorithm for Cognitive Radio Network: Multi-User Multi-Armed Bandit Approach [J] . Navikkumar Modi, Philippe Mary, Christophe Moy IEEE Transactions on Cognitive Communications and Networking . 2017,第1期

机译：认知无线电网络的QoS驱动信道选择算法：多用户多武装匪徒方法
3. Intelligent and Reconfigurable Architecture for KL Divergence-Based Multi-Armed Bandit Algorithms [J] . Santosh S. V. Sai, Darak Sumit J. IEEE transactions on circuits and systems. II, Express briefs . 2021,第3期

机译：基于KL发散的多武装强盗算法的智能和可重构架构
4. Comparing Multi-Armed Bandit Algorithms and Q-learning for Multiagent Action Selection: a Case Study in Route Choice [C] . Thiago B. F. de Oliveira, Ana L. C. Bazzan, Bruno C. da Silva, International Joint Conference on Neural Networks . 2018

机译：比较多武装强盗算法和Q学习的多主体行动选择：路线选择的案例研究
5. Offline Evaluation of Multi-Armed Bandit Algorithms Using Bootstrapped Replay on Expanded Data [D] . Dai, Jin. 2021

机译：在扩展数据上使用引导重播的多武装强盗算法的离线评估
6. Smoking and the bandit: A preliminary study of smoker and non-smoker differences in exploratory behavior measured with a multi-armed bandit task [O] . Merideth A. Addicott, John M. Pearson, Jessica Wilson, -1

机译：吸烟和强盗：用多武装强盗任务测量的探索性行为的吸烟者和非吸烟者差异的初步研究
7. Dynamic Multi-Armed Bandits and Extreme Value-based Rewards for Adaptive Operator Selection in Evolutionary Algorithms [O] . Fialho, Álvaro, Da Costa, Luis, Schoenauer, Marc, 2009

机译：动态多臂土匪和基于极值的奖励，用于进化算法中的自适应算子选择

Comparing Multi-Armed Bandit Algorithms and Q-learning for Multiagent Action Selection: a Case Study in Route Choice

摘要

著录项

相似文献

相关主题

期刊订阅