首页> 外文会议>Brazilian Conference on Intelligent Systems >Individual versus Difference Rewards on Reinforcement Learning for Route Choice
【24h】

Individual versus Difference Rewards on Reinforcement Learning for Route Choice

机译:路线选择强化学习的个人奖励与差异奖励

获取原文

摘要

In transportation systems, drivers usually choose their routes based on their own knowledge about the network. Such a knowledge is obtained from drivers' previous trips. When drivers are faced with jams they may change their routes to take a faster path. But this re-routing may not be a good choice because other drivers can proceed in the same way. Furthermore, such behaviour can create jams in other links. On the other hand, if drivers build their routes aiming at maximizing the overall travel time (system's utility), rather than their individual travel time (agents' utility), the whole system may benefit. This work presents two reinforcement learning algorithms for solving the route choice problem in road networks. The IQ-learning uses an individual reward function, which aims at finding a policy that maximizes the agents' utility. On the other hand, DQ-learning algorithm shapes the agents' reward based on difference rewards function, and aims at finding a route that maximizes the system's utility. Through experiments we show that DQ-learning is able to reduce the overall travel time when compared to other methods.
机译:在运输系统中,驾驶员通常根据自己对网络的了解来选择路线。这些知识是从驾驶员以前的旅行中获得的。当驾驶员遇到堵塞时,他们可能会改变路线以走更快的道路。但是,重新路由可能不是一个好选择,因为其他驱动程序可以相同的方式进行。此外,这种行为可能会在其他链接中造成阻塞。另一方面,如果驾驶员建立路线以最大化总行驶时间(系统的效用)而不是个人行驶时间(代理商的效用)为目标,则整个系统可能会受益。这项工作提出了两种用于解决道路网络中路线选择问题的强化学习算法。智商学习使用个人奖励功能,该功能旨在找到一种使代理人的效用最大化的策略。另一方面,DQ学习算法会基于差异奖励功能来调整代理的奖励,并旨在找到一条使系统效用最大化的路线。通过实验表明,与其他方法相比,DQ学习能够减少总的旅行时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号