首页> 外文期刊>Transportation Research Part B: Methodological >An actor-critic deep reinforcement learning approach for metro train scheduling with rolling stock circulation under stochastic demand
【24h】

An actor-critic deep reinforcement learning approach for metro train scheduling with rolling stock circulation under stochastic demand

机译:随机需求下滚动股票循环的地铁列车调节探测深度加强学习方法

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a novel actor-critic deep reinforcement learning approach for metro train scheduling with circulation of limited rolling stock. The scheduling problem is modeled as a Markov decision process driven by stochastic passenger demand. As in most dynamic optimization problems, the complexity of the scheduling process grows exponentially with the amount of states, decisions, and uncertainties involved. This study aims to address this 'curses of dimensionality' issue by adopting an actor-critic deep reinforcement learning solution framework. The framework simplifies the evaluation and searching process for potential optimal solutions by parameterizing the original state and decision spaces with the use of artificial neural networks. A deep deterministic policy gradient algorithm is developed for training the artificial neural networks via simulated system transitions before the actor-critic agent can be applied for online schedule control. The proposed approach is tested with a real-world scenario configured with data collected from the Victoria Line of London Underground, UK. Experiment results illustrate the advantages of the proposed method over a range of established meta-heuristics in terms of computing time, system efficiency, and robustness under different stochastic environments. This study innovates urban transit operations with state-of-the-art computer science and dynamic optimization techniques. (C) 2020 Elsevier Ltd. All rights reserved.
机译:本文提出了一种新的演员评论家,用于滚动储量流通的地铁列车调度深度增强学习方法。调度问题被建模为由随机乘客需求驱动的马尔可夫决策过程。与大多数动态优化问题一样,调度过程的复杂性与所涉及的状态,决策和不确定性的数量呈指数级增长。本研究旨在通过采用演员 - 评论家的深度加强学习解决方案框架来解决这一“维度”问题的“诅咒”问题。该框架通过使用人工神经网络参数化原始状态和决策空间来简化潜在最佳解决方案的评估和搜索过程。开发了一种深度确定性政策梯度算法,用于通过模拟系统转换训练人工神经网络,然后在演员 - 批评者代理可以应用于在线计划控制。建议的方法是用现实世界的情景,配置了从英国伦敦维多利亚线收集的数据。实验结果说明了所提出的方法在不同随机环境下的计算时间,系统效率和鲁棒性方面在一系列建立的元启发式中的优点。本研究创新了与最先进的计算机科学和动态优化技术的城市过境运营。 (c)2020 elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号