首页> 外文期刊>Transportation research >Optimal passenger-seeking policies on E-hailing platforms using Markov decision process and imitation learning
【24h】

Optimal passenger-seeking policies on E-hailing platforms using Markov decision process and imitation learning

机译:使用马尔可夫决策过程和模仿学习的电子叫车平台上的最佳客流策略

获取原文
获取原文并翻译 | 示例
       

摘要

Vacant taxi drivers' passenger seeking process in a road network generates additional vehicle miles traveled, adding congestion and pollution into the road network and the environment. This paper aims to employ a Markov Decision Process (MDP) to model idle e-hailing drivers' optimal sequential decisions in passenger-seeking. Transportation network companies (TNC) or e-hailing (e.g., Didi, Uber) drivers exhibit different behaviors from traditional taxi drivers because e-hailing drivers do not need to actually search for passengers. Instead, they reposition themselves so that the matching platform can match a passenger. Accordingly, we incorporate e-hailing drivers' new features into our MDP model. The reward function used in the MDP model is uncovered by leveraging an inverse reinforcement learning technique. We then use 44,160 Didi drivers' 3-day trajectories to train the model. To validate the effectiveness of the model, a Monte Carlo simulation is conducted to simulate the performance of drivers under the guidance of the optimal policy, which is then compared with the performance of drivers following one baseline heuristic, namely, the local hotspot strategy. The results show that our model is able to achieve a 17.5% improvement over the local hotspot strategy in terms of the rate of return. The proposed MDP model captures the supply-demand ratio considering the fact that the number of drivers in this study is sufficiently large and thus the number of unmatched orders is assumed to be negligible. To better incorporate the competition among multiple drivers into the model, we have also devised and calibrated a dynamic adjustment strategy of the order matching probability.
机译:空出租车司机在道路网络中的乘客寻找过程会产生额外的行驶里程,从而增加道路网络和环境的拥堵和污染。本文旨在采用马尔可夫决策过程(MDP)来建模空闲电子叫车驾驶员在寻求乘客时的最佳顺序决策。交通网络公司(TNC)或电子叫车(例如Didi,Uber)司机表现出与传统出租车司机不同的行为,因为电子叫车司机不需要实际搜索乘客。相反,他们重新定位自己,以便匹配平台可以匹配乘客。因此,我们将电子叫车驾驶员的新功能纳入了我们的MDP模型。通过利用逆向强化学习技术,发现了MDP模型中使用的奖励函数。然后,我们使用44,160名Didi驾驶员的3天轨迹来训练模型。为了验证模型的有效性,在最佳策略的指导下进行了蒙特卡洛模拟,以模拟驾驶员的绩效,然后将其与遵循一种基准启发法(即本地热点策略)的驾驶员的绩效进行比较。结果表明,在收益率方面,我们的模型能够比本地热点策略提高17.5%。考虑到以下事实,建议的MDP模型捕获了供求比率:本研究中的驱动因素数量足够大,因此,假定不匹配订单的数量可以忽略不计。为了更好地将多个驱动因素之间的竞争纳入模型,我们还设计并校准了订单匹配概率的动态调整策略。

著录项

  • 来源
    《Transportation research》 |2020年第2期|91-113|共23页
  • 作者

  • 作者单位

    Columbia Univ Dept Civil Engn & Engn Mech New York NY 10027 USA;

    Columbia Univ Dept Civil Engn & Engn Mech New York NY 10027 USA|Columbia Univ Data Sci Inst New York NY 10027 USA;

    Didi Chuxing Inc Beijing Peoples R China;

    Tongji Univ Natl Maglev Transportat Engn R&D Ctr Shanghai Peoples R China;

    Univ Michigan Transportat Res Inst Ann Arbor MI 48109 USA|Univ Michigan Ford Sch Publ Policy Ann Arbor MI 48109 USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Markov Decision Process (MDP); Imitation learning; E-hailing;

    机译:马尔可夫决策过程(MDP);模仿学习;电子称呼;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号