Individual versus Difference Rewards on Reinforcement Learning for Route Choice

机译：路线选择强化学习的个人奖励与差异奖励

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In transportation systems, drivers usually choose their routes based on their own knowledge about the network. Such a knowledge is obtained from drivers' previous trips. When drivers are faced with jams they may change their routes to take a faster path. But this re-routing may not be a good choice because other drivers can proceed in the same way. Furthermore, such behaviour can create jams in other links. On the other hand, if drivers build their routes aiming at maximizing the overall travel time (system's utility), rather than their individual travel time (agents' utility), the whole system may benefit. This work presents two reinforcement learning algorithms for solving the route choice problem in road networks. The IQ-learning uses an individual reward function, which aims at finding a policy that maximizes the agents' utility. On the other hand, DQ-learning algorithm shapes the agents' reward based on difference rewards function, and aims at finding a route that maximizes the system's utility. Through experiments we show that DQ-learning is able to reduce the overall travel time when compared to other methods.

机译：在运输系统中，驾驶员通常根据自己对网络的了解来选择路线。这些知识是从驾驶员以前的旅行中获得的。当驾驶员遇到堵塞时，他们可能会改变路线以走更快的道路。但是，重新路由可能不是一个好选择，因为其他驱动程序可以相同的方式进行。此外，这种行为可能会在其他链接中造成阻塞。另一方面，如果驾驶员建立路线以最大化总行驶时间（系统的效用）而不是个人行驶时间（代理商的效用）为目标，则整个系统可能会受益。这项工作提出了两种用于解决道路网络中路线选择问题的强化学习算法。智商学习使用个人奖励功能，该功能旨在找到一种使代理人的效用最大化的策略。另一方面，DQ学习算法会基于差异奖励功能来调整代理的奖励，并旨在找到一条使系统效用最大化的路线。通过实验表明，与其他方法相比，DQ学习能够减少总的旅行时间。

著录项

来源
《Brazilian Conference on Intelligent Systems》|2014年|253-258|共6页
会议地点
作者
Grunitzki Ricardo; Ramos Gabriel de Oliveira; Bazzan Ana Lucia Cetertich;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Abstracts; Convergence; Heuristic algorithms; Learning (artificial intelligence); Roads; Vehicles; difference rewards; multiagent systems; reinforcement learning;

机译：摘要;收敛;启发式算法;学习（人工智能）;道路;车辆;差异奖励;多智能体系统;强化学习;

相似文献

外文文献
中文文献
专利

1. MECHANISMS OF IMPULSIVE CHOICE: I. INDIVIDUAL DIFFERENCES IN INTERVAL TIMING AND REWARD PROCESSING [J] . Andrew T. Marshall, Aaron P. Smith, Kimberly Kirkpatrick Journal of the experimental analysis of behavior . 2014,第1期

机译：冲动选择的机制：I.时间间隔和奖励过程中的个体差异
2. A model of reward choice based on the theory of reinforcement learning [J] . I. A. Smirnitskaya, A. A. Frolov, G. Kh. Merzhanova Neuroscience and behavioral physiology . 2008,第3期

机译：基于强化学习理论的奖励选择模型
3. Individual Differences in Reward-Based Learning Predict Fluid Reasoning Abilities [J] . Stocco Andrea, Prat Chantel S., Graham Lauren K. Cognitive Science . 2021,第2期

机译：基于奖励的学习的个人差异预测流体推理能力
4. Individual versus Difference Rewards on Reinforcement Learning for Route Choice [C] . Grunitzki Ricardo, Ramos Gabriel de Oliveira, Bazzan Ana Lucia Cetertich Brazilian Conference on Intelligent Systems . 2014

机译：关于途径学习的个人与差异奖励
5. Individual differences in learning and remembering music: Auditory versus visual presentation. [D] . Korenman, Lisa Michele. 2004

机译：学习和记忆音乐的个体差异：听觉与视觉表现。
6. Individual differences in sensitivity to reward and punishment and neural activity during reward and avoidance learning [O] . Sang Hee Kim, HeungSik Yoon, Hackjin Kim, 2015

机译：奖励和回避学习过程中个体对奖惩的敏感性和神经活动的个体差异
7. Mechanisms of impulsive choice: I. Individual differences in interval timing and reward processing [O] . Marshall Andrew T., Smith Aaron P., Kirkpatrick Kimberly 2014

机译：冲动选择的机制：I.间隔时间和奖励过程中的个体差异
8. Individual Versus Social Optimization in Commuter Route Choice [R] . Bell, C. E. 1980

机译：通勤路径选择中的个体与社会优化

Individual versus Difference Rewards on Reinforcement Learning for Route Choice

摘要

著录项

相似文献

相关主题

期刊订阅