首页> 外文期刊>Control Engineering Practice >Evaluating semi-cooperative Nash/Stackelberg Q-learning for traffic routes plan in a single intersection
【24h】

Evaluating semi-cooperative Nash/Stackelberg Q-learning for traffic routes plan in a single intersection

机译:评估半协同纳什/ Stackelberg Q-Learning进行单个交叉路口的交通路线计划

获取原文
获取原文并翻译 | 示例
       

摘要

As traffic congestion grows tremendous and frequent in the urban transportation system, many efficient models with reinforcement learning (RL) methods have already been proposed to optimize this situation. A multi-agent reinforcement learning (MARL) system can be constructed from the traffic problem, where the incoming links (i.e., sections) are regarded as agents and the actions made by the agents are for controlling signal lights. A semi-cooperative Nash Q-learning approach on the basis of single-agent Q-leaming and Nash equilibrium is proposed and presented in this paper, in which the agents agree on the process of action selection by Nash equilibrium, but strive finally for a common goal with cooperative behaviour when more than one Nash equilibriums exist. Then an extended version called semi-cooperative Stackelberg Q-learning is designed to make a comparison, where Nash equilibrium is replaced by Stackelberg equilibrium in the Q-learning process. Specifically, the agent who has the largest queues will be promoted as a leader and the others are followers who react to the leader's decision. Instead of adjusting the plan of green light timing published in other research, this paper is contributing to finding the best multi-routes plan for passing most vehicles in a single traffic intersection, with combining game theory and RL in decision-making in the multi-agent framework. These two multi-agent Q-learning methods are implemented and compared with the constant strategy (i.e., the time intervals of green or red lights are fixed and periodical). The simulated result shows that the performance of semi-cooperative Stackelberg Q-learning is better.
机译:随着交通拥堵在城市交通系统中增长巨大和频繁,已经提出了许多具有强化学习(RL)方法的高效模型来优化这种情况。可以从交通问题构建多种代理增强学习(MARL)系统,其中传入链路(即部分)被视为代理,并且代理制造的动作用于控制信号灯。在本文中提出并提出了一种基于单代理Q-Leaming和纳什均衡的半合作纳什Q学习方法,其中代理商对纳什均衡进行行动选择的方法,但最终争取当存在多于一种纳什均衡时,具有合作行为的共同目标。然后,一个名为Semi-Coperative Stackelberg Q-Learning的扩展版本旨在进行比较,其中纳什均衡被Q学习过程中的Stackelberg均衡所取代。具体而言,拥有最大队列的代理人将被推广为领导者,其他人是对领导者决定作出反应的追随者。本文代替调整在其他研究中发表的绿灯时间计划,这是有助于寻找最佳的多路线计划,以便在单一的流量交叉路口中传递大多数车辆,并将游戏理论和RL在多数中的决策中的决策。代理框架。实现并与恒定策略(即,绿色或红灯的时间间隔和周期性)实现并将这两个多代理Q学习方法进行了实施。模拟结果表明,半合作Stackelberg Q-Learning的性能更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号