首页> 外文学位 >Mastraf: A decentralized multi-agent system for network-wide traffic signal control with dynamic coordination.
【24h】

Mastraf: A decentralized multi-agent system for network-wide traffic signal control with dynamic coordination.

机译:Mastraf:一种分散的多主体系统,用于动态协调网络范围内的交通信号控制。

获取原文
获取原文并翻译 | 示例

摘要

Continuous increases in traffic volume and limited available capacity in the roadway system have created a need for improved traffic control. From traditional pre-timed isolated signals to actuated and coordinated corridors, traffic control for urban networks has evolved into more complex adaptive signal control systems. However, unexpected traffic fluctuations, rapid changes in traffic demands, oversaturation, the occurrence of incidents, and adverse weather conditions, among others, significantly impact the traffic network operation in ways that current control systems cannot always cope with.;On the other hand, strategies for traffic control based on developments from the field of machine learning can provide promising alternative solutions, particularly those that make use of unsupervised learning such as reinforcement learning (RL) - also referred as approximate dynamic programming (ADP) in some research communities. For the traffic control problem, examples of convenient RL algorithms are the off-policy Q-learning and the ADP using a post decision state variable, since they address processes with sequential decision making, do not need to compute transition probabilities, and are well suited for high dimensional spaces.;A series of benefits are expected from these algorithms in the traffic control domain: 1) no need of prediction models to transition traffic over time and estimate the best actions; 2) availability of cost-to-go estimates at any time (appropriate for real-time applications); 3) selfevolving policies; and 4) flexibility to make use of new sources of information part of emergent Intelligent Transportation Systems (ITS) such as mobile vehicle detectors (Bluetooth and GPS vehicle locators).;Given the potential benefits of these strategies, this research proposes MASTraf: a decentralized Multi-Agent System for network-wide Traffic signal control with dynamic coordination. MASTraf is designed to capture the behavior of the environment and take decisions based on situations directly observed by RL agents. Also, agents can communicate with each other, exploring the effects of temporary coalitions or subgroups of intersections as a mechanism for coordination.;Separate MASTraf implementations with similar state and reward functions using Qlearning and ADP were tested using a microscopic traffic simulator (VISSIM) and real-time manipulation of the traffic signals through the software's COM interface. Testing was conducted to determine the performance of the agents in scenarios with increasing complexity, from a single intersection, to arterials and networks, both in undersaturated and oversaturated conditions.;Results show that the multi-agent system provided by MASTraf improves its performance as the agents accumulate experience, and the system was able to efficiently manage the traffic signals of simple and complex scenarios. Exploration of the policies generated by MASTraf showed that the agents followed expected behavior by providing green to greater vehicle demands and accounting for the effects of blockages and lost time. The performance of MASTraf was on par with current state of practice tools for finding signal control settings, but MASTraf can also adapt to changes in demands and driver behavior by adjusting the signal timings in real-time, thus improving coordination and preventing queue spillbacks and green starvation.;A strategy for signal coordination was also tested in one of the MASTraf implementations, showing increased throughput and reduced number of stops, as expected. The coordination employed a version of the max-plus algorithm embedded in the reward structure, acting as a bias towards improved coordination. The response of the system using imprecise detector data, in the form of coarse aggregation, showed that the system was able to handle oversaturation under such conditions. Even when the data had only 25% of the resolution of the original implementation, the system throughput was only reduced by 5% and the number of stops per vehicle was increased by 8%.;The state and reward formulations allowed for a simple function approximation method in order to reduce the memory requirements for storing the state space, and also to create a form of generalization for states that have not been visited or that have not been experienced enough. Given the discontinuities in the reward function generated by penalties for blockages and lost times, the value approximation was conducted through a series of functions for each action and each of the conditions before and after a discontinuity. The policies generated using MASTraf with a function approximation were analyzed for different intersections in the network, showing agent behavior that reflected the principles formulated in the original problem using lookup tables, including right of way assignment based on expected rewards with consideration of penalties such as lost time. In terms of system performance, MASTraf with function approximation resulted in average reductions of 1% in the total system throughput and 3.6% increases in the number of stops per vehicle, when compared to the implementation using lookup tables on a congested network of 20 intersections.
机译:道路系统中交通量的不断增加和可用容量的有限,产生了对改善交通控制的需求。从传统的预定时隔离信号到启动和协调的走廊,城市网络的交通控制已发展成为更加复杂的自适应信号控制系统。但是,交通意外波动,交通需求快速变化,饱和度过高,事件的发生以及恶劣的天气条件等,以当前控制系统无法始终应对的方式,对交通网络的运行产生了重大影响。基于机器学习领域发展的交通控制策略可以提供有前途的替代解决方案,尤其是那些利用无监督学习的方法,例如强化学习(RL)-在某些研究社区中也称为近似动态编程(ADP)。对于流量控制问题,便捷的RL算法的示例是非政策性Q学习和使用后期决策状态变量的ADP,因为它们可处理具有顺序决策的流程,无需计算过渡概率,并且非常适合这些算法在交通控制领域中有望带来一系列好处:1)无需预测模型即可随时间推移过渡交通并估算最佳行动; 2)随时可以获得成本估算(适用于实时应用); 3)自我发展的政策; 4)灵活利用新兴智能交通系统(ITS)的新信息源,例如移动车辆检测器(蓝牙和GPS车辆定位器)。;鉴于这些策略的潜在优势,本研究提出了MASTraf:去中心化具有动态协调功能的全网络交通信号控制多智能系统。 MASTraf旨在捕获环境行为并根据RL代理直接观察到的情况做出决策。代理还可以彼此通信,探索临时联盟或交叉点子集的影响作为一种协调机制。使用微观交通模拟器(VISSIM)对使用Qlearning和ADP的具有类似状态和奖励功能的单独MASTraf实现进行了测试,并且通过软件的COM接口实时处理交通信号。进行了测试,以确定在饱和度和过饱和条件下,从单个交叉点到动脉和网络的复杂性不断增加的情况下,代理的性能。结果表明,MASTraf提供的多代理系统随着性能的提高而提高了性能。代理积累了经验,该系统能够有效管理简单和复杂场景的交通信号。对MASTraf生成的策略的探索表明,代理商通过提供绿色环保车辆来满足更大的需求,并考虑了堵塞和浪费时间的影响,从而遵循了预期的行为。 MASTraf的性能与用于查找信号控制设置的实践工具的当前状态相提并论,但是MASTraf也可以通过实时调整信号时序来适应需求和驾驶员行为的变化,从而改善协调性并防止队列溢出和绿色故障。在一项MASTraf实施中还测试了一种信号协调策略,与预期的一样,该方法可提高吞吐量并减少停车次数。协调采用了在奖励结构中嵌入的max-plus算法的一种版本,作为改进协调的一种偏见。使用不精确检测器数据的系统响应(以粗略聚合的形式)表明,该系统能够处理这种情况下的过饱和。即使数据只有原始实现的分辨率的25%,系统吞吐量也仅减少5%,每辆车的停车次数增加8%。状态和奖励公式允许简单的函数近似为了减少用于存储状态空间的内存需求,并为尚未访问或未充分体验的状态创建一种通用形式。给定因阻塞和损失时间的惩罚而产生的奖励函数中的不连续性,通过一系列函数针对不连续前后的每个动作和每个条件进行值逼近。使用MASTraf生成的具有函数逼近的策略针对网络中的不同交叉点进行了分析,显示了代理行为,这些行为反映了使用查找表反映原始问题中制定的原理的情况,包括基于预期奖励并考虑了损失等惩罚的通行权分配时间。在系统性能方面与在20个交叉路口的拥挤网络中使用查找表的实现相比,具有函数逼近功能的MASTraf导致系统总吞吐量平均减少1%,每辆车的停车次数增加3.6%。

著录项

  • 作者

    Medina, Juan C.;

  • 作者单位

    University of Illinois at Urbana-Champaign.;

  • 授予单位 University of Illinois at Urbana-Champaign.;
  • 学科 Engineering Civil.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 159 p.
  • 总页数 159
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:41:19

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号