On the Role of Reward Functions for Reinforcement Learning in the Traffic Assignment Problem

机译：奖励功能对强化学习在交通分配问题中的作用

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The traffic assignment problem (TAP) consists of assigning routes to road users in order to minimize traffic congestion. Traditional methods for solving the TAP assume the existence of a central authority who computes and dictates routes to road users. Multi-agent reinforcement learning (MARL) approaches are more realistic in solving this kind of problem because they consider that road users (agents) have complete autonomy for choosing routes. However, MARL approaches usually require a long training period in order to compute the optimal routes, which could be a major limitation in more realistic traffic scenarios. In this paper, we tackle this problem by evaluating the performance of three conceptually different reward functions, namely: expert-designed rewards, difference rewards, and intrinsically motivated rewards. In particular, our focus lies on providing a deeper understanding of the impact of these reward functions on the agents’ performance, thus contributing towards reducing congestion levels. To this end, we perform an extensive experimental evaluation on different road networks, including up to 360,600 concurrently learning agents. Our results show that, although the adopted reward functions were not able to speed up the learning process, the correct reward function choice plays an important role in the quality of the learned solution.

机译：交通分配问题（TAP）包括为道路用户分配路线，以最大程度地减少交通拥堵。解决TAP的传统方法假定存在中央主管部门，该中央主管部门负责计算和指示通往道路使用者的路线。多智能体强化学习（MARL）方法在解决此类问题时更为现实，因为他们认为道路使用者（智能体）拥有选择路线的完全自主权。但是，MARL方法通常需要很长的训练时间才能计算出最佳路线，这在更现实的交通场景中可能是一个主要限制。在本文中，我们通过评估三个概念上不同的奖励函数的性能来解决此问题，即：专家设计的奖励，差异奖励和内在动机的奖励。特别是，我们的重点在于更深入地了解这些奖励功能对代理商绩效的影响，从而有助于减少拥塞程度。为此，我们在不同的道路网络上进行了广泛的实验评估，其中包括多达360,600个并发学习代理。我们的结果表明，尽管采用的奖励函数无法加快学习过程，但正确的奖励函数选择对学习解决方案的质量起着重要作用。

著录项

来源
《International Joint Conference on Neural Networks》|2020年|1-9|共9页
会议地点
作者
Ricardo Grunitzki; Gabriel de Oliveira Ramos;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Roads; Task analysis; Reinforcement learning; Vehicles; Decision making; Supply and demand;

机译：道路;任务分析;强化学习;车辆;决策制定;供需;

相似文献

外文文献
中文文献
专利

1. Adaptive Traffic Signal Control : Exploring Reward Definition For Reinforcement Learning [J] . Saad Touhbi, Mohamed Ait Babram, Tri Nguyen-Huu, Procedia Computer Science . 2017,第1期

机译：自适应交通信号控制：探索奖励定义以进行强化学习
2. Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression [J] . Lim Jaehyun, Ha Seungchul, Choi Jongeun Mechatronics, IEEE/ASME Transactions on . 2020,第4期

机译：高斯过程回归深增强学习的奖励功能预测
3. Reinforcement learning vs. rule-based adaptive traffic signal control: A Fourier basis linear function approximation for traffic signal control [J] . Ziemke Theresa, Alegre Lucas N., Bazzan Ana L.C. AI communications . 2021,第1期

机译：加固学习与规则的自适应交通信号控制：交通信号控制的傅立叶基线函数近似
4. Assessment of Reward Functions for Reinforcement Learning Traffic Signal Control under Real-World Limitations [C] . Alvaro Cabrejas Egea, Shaun Howell, Maksis Knutins, IEEE International Conference on Systems, Man, and Cybernetics . 2020

机译：在真实世界限制下，评估加强学习交通信号控制的奖励职能
5. Deep Reinforcement Learning with Accelerated Reward Function Technique for Robotics Task Planning [D] . Shaikh, Shifa. 2021

机译：机器人任务规划加速奖励功能技术的深增强学习
6. Heads for learning tails for memory: reward reinforcement and a role of dopamine in determining behavioral relevance across multiple timescales [O] . Mathieu Baudonnat, Anna Huber, Vincent David, 2013

机译：学习的头记忆的尾巴：奖励强化和多巴胺在确定跨多个时间尺度的行为相关性中的作用
7. A Distributed Assignment Method for Dynamic Traffic Assignment Using Heterogeneous-Adviser Based Multi-Agent Reinforcement Learning [O] . Zhaotian Pan, Zhaowei Qu, Yongheng Chen, 2020

机译：一种使用基于异构顾问的多功能钢筋学习的动态流量分配的分布式分配方法

On the Role of Reward Functions for Reinforcement Learning in the Traffic Assignment Problem

摘要

著录项

相似文献

相关主题

期刊订阅