Optimal passenger-seeking policies on E-hailing platforms using Markov decision process and imitation learning

首页> 外文期刊>Transportation research >Optimal passenger-seeking policies on E-hailing platforms using Markov decision process and imitation learning

【24h】

Optimal passenger-seeking policies on E-hailing platforms using Markov decision process and imitation learning

机译：使用马尔可夫决策过程和模仿学习的电子叫车平台上的最佳客流策略

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Vacant taxi drivers' passenger seeking process in a road network generates additional vehicle miles traveled, adding congestion and pollution into the road network and the environment. This paper aims to employ a Markov Decision Process (MDP) to model idle e-hailing drivers' optimal sequential decisions in passenger-seeking. Transportation network companies (TNC) or e-hailing (e.g., Didi, Uber) drivers exhibit different behaviors from traditional taxi drivers because e-hailing drivers do not need to actually search for passengers. Instead, they reposition themselves so that the matching platform can match a passenger. Accordingly, we incorporate e-hailing drivers' new features into our MDP model. The reward function used in the MDP model is uncovered by leveraging an inverse reinforcement learning technique. We then use 44,160 Didi drivers' 3-day trajectories to train the model. To validate the effectiveness of the model, a Monte Carlo simulation is conducted to simulate the performance of drivers under the guidance of the optimal policy, which is then compared with the performance of drivers following one baseline heuristic, namely, the local hotspot strategy. The results show that our model is able to achieve a 17.5% improvement over the local hotspot strategy in terms of the rate of return. The proposed MDP model captures the supply-demand ratio considering the fact that the number of drivers in this study is sufficiently large and thus the number of unmatched orders is assumed to be negligible. To better incorporate the competition among multiple drivers into the model, we have also devised and calibrated a dynamic adjustment strategy of the order matching probability.

机译：空出租车司机在道路网络中的乘客寻找过程会产生额外的行驶里程，从而增加道路网络和环境的拥堵和污染。本文旨在采用马尔可夫决策过程（MDP）来建模空闲电子叫车驾驶员在寻求乘客时的最佳顺序决策。交通网络公司（TNC）或电子叫车（例如Didi，Uber）司机表现出与传统出租车司机不同的行为，因为电子叫车司机不需要实际搜索乘客。相反，他们重新定位自己，以便匹配平台可以匹配乘客。因此，我们将电子叫车驾驶员的新功能纳入了我们的MDP模型。通过利用逆向强化学习技术，发现了MDP模型中使用的奖励函数。然后，我们使用44,160名Didi驾驶员的3天轨迹来训练模型。为了验证模型的有效性，在最佳策略的指导下进行了蒙特卡洛模拟，以模拟驾驶员的绩效，然后将其与遵循一种基准启发法（即本地热点策略）的驾驶员的绩效进行比较。结果表明，在收益率方面，我们的模型能够比本地热点策略提高17.5％。考虑到以下事实，建议的MDP模型捕获了供求比率：本研究中的驱动因素数量足够大，因此，假定不匹配订单的数量可以忽略不计。为了更好地将多个驱动因素之间的竞争纳入模型，我们还设计并校准了订单匹配概率的动态调整策略。

著录项

来源
《Transportation research》 |2020年第2期|91-113|共23页
作者

展开▼
作者单位

Columbia Univ Dept Civil Engn & Engn Mech New York NY 10027 USA;

Columbia Univ Dept Civil Engn & Engn Mech New York NY 10027 USA|Columbia Univ Data Sci Inst New York NY 10027 USA;

Didi Chuxing Inc Beijing Peoples R China;

Tongji Univ Natl Maglev Transportat Engn R&D Ctr Shanghai Peoples R China;

Univ Michigan Transportat Res Inst Ann Arbor MI 48109 USA|Univ Michigan Ford Sch Publ Policy Ann Arbor MI 48109 USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Markov Decision Process (MDP); Imitation learning; E-hailing;

机译：马尔可夫决策过程（MDP）;模仿学习;电子称呼;

相似文献

外文文献
中文文献
专利

1. Learning Optimal Policies in Markov Decision Processes with Value Function Discovery [J] . Martijn Onderwater, Sandjai Bhulai, Rob van der Mei Performance evaluation review . 2015,第2期

机译：通过价值函数发现学习马尔可夫决策过程中的最优策略
2. On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies [J] . Feinberg Eugene A., Lewis Mark E. Naval Research Logistics . 2018,第8期

机译：马尔可夫决策过程最优动作的收敛性与（s，S）库存策略的最优性
3. Properties of the optimality equation and optimal policies in discrete time Markov decision processes [J] . Qiying Hu, Wuyi Yue 電子情報通信学会技術研究報告. 回路とシステム. Circuits and Systems . 2002,第427期

机译：离散时间马尔可夫决策过程中最优方程和最优策略的性质
4. Learning an Optimal Control Policy for a Markov Decision Process Under Linear Temporal Logic Specifications [C] . Masaki Hiromoto, Toshimitsu Ushio IEEE Symposium Series on Computational Intelligence . 2015

机译：学习线性时序逻辑下马尔可夫决策过程的最优控制策略
5. Performance guarantee of a sub-optimal policy for a discrete Markov decision process and its application to a robotic surveillance problem. [D] . Park, Myoungkuk. 2014

机译：离散马尔可夫决策过程的次优策略的性能保证及其在机器人监视问题中的应用。
6. Optimal Information Collection Policies in a Markov Decision Process Framework [O] . Lauren E. Cipriano, Jeremy D. Goldhaber-Fiebert, Shan Liu, -1

机译：马尔可夫决策过程框架中的最佳信息收集策略
7. Learning Optimal Policies in Markov Decision Processes with Value Function Discovery [O] . Onderwater, Martijn, Bhulai, Sandjai, Mei, Rob 2015

机译：通过价值函数发现学习马尔可夫决策过程中的最优策略
8. Comments on the Sensitivity of the Optimal Cost and the Optimal Policy for a Discrete Markov Decision Process. [R] . Sernik, E. L., Marcus, S. I. 1989

机译：评离离散马尔可夫决策过程的最优成本敏感性和最优策略。

Optimal passenger-seeking policies on E-hailing platforms using Markov decision process and imitation learning

摘要

著录项

相似文献

相关主题

期刊订阅