首页> 外文会议>International Joint Conference on Neural Networks >Comparing Multi-Armed Bandit Algorithms and Q-learning for Multiagent Action Selection: a Case Study in Route Choice
【24h】

Comparing Multi-Armed Bandit Algorithms and Q-learning for Multiagent Action Selection: a Case Study in Route Choice

机译:多武装强盗算法和Q学习进行多武器动作选择:路径选择案例研究

获取原文

摘要

The multi-armed bandit (MAB) problem is concerned with an agent choosing which arm of a slot machine to play in order to optimize its reward. A family of reinforcement learning algorithms exists to tackle this problem, including a few variants that consider more than one agent (thus, characterizing a repeated game) and non-stationary variants. In this paper, we seek to evaluate the performance of some of these MAB algorithms and compare them with Q-learning when applied to a non-stationary repeated game, where commuter agents face the task of learning how to choose a route that minimizes their travel times.
机译:多武装强盗(MAB)问题涉及一种选择槽机的手臂以优化其奖励的代理。存在一系列强化学习算法来解决这个问题,包括几种考虑多个代理的变型(因此,表征了重复的游戏)和非静止变体。在本文中,我们寻求评估这些MAB算法中的一些的性能,并在应用于非静止的重复游戏时将它们与Q学习进行比较,通勤代理面临着学习如何选择最小化旅行的路线的任务时代。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号