...
首页> 外文期刊>International Journal of Geographical Information Science >Optimize taxi driving strategies based on reinforcement learning
【24h】

Optimize taxi driving strategies based on reinforcement learning

机译:基于强化学习优化出租车驾驶策略

获取原文
获取原文并翻译 | 示例

摘要

The efficiency of taxi services in big cities influences not only the convenience of peoples' travel but also urban traffic and profits for taxi drivers. To balance the demands and supplies of taxicabs, spatio-temporal knowledge mined from historical trajectories is recommended for both passengers finding an available taxicab and cabdrivers estimating the location of the next passenger. However, taxi trajectories are long sequences where single-step optimization cannot guarantee the global optimum. Taking long-term revenue as the goal, a novel method is proposed based on reinforcement learning to optimize taxi driving strategies for global profit maximization. This optimization problem is formulated as a Markov decision process for the whole taxi driving sequence. The state set in this model is defined as the taxi location and operation status. The action set includes the operation choices of empty driving, carrying passengers or waiting, and the subsequent driving behaviors. The reward, as the objective function for evaluating driving policies, is defined as the effective driving ratio that measures the total profit of a cabdriver in a working day. The optimal choice for cabdrivers at any location is learned by the Q-learning algorithm with maximum cumulative rewards. Utilizing historical trajectory data in Beijing, the experiments were conducted to test the accuracy and efficiency of the method. The results show that the method improves profits and efficiency for cabdrivers and increases the opportunities for passengers to find taxis as well. By replacing the reward function with other criteria, the method can also be used to discover and investigate novel spatial patterns. This new model is prior knowledge-free and globally optimal, which has advantages over previous methods.
机译:大城市出租车服务的效率不仅影响人们出行的便利,而且影响城市交通和出租车司机的利润。为了平衡出租车的需求和供应,建议从历史轨迹中挖掘时空知识,以供乘客寻找可用的出租车和出租车司机估计下一位乘客的位置。但是,滑行轨迹是长序列,其中单步优化不能保证全局最优。以长期收入为目标,提出一种基于强化学习的新方法,以优化出租车驾驶策略,实现全球利润最大化。该优化问题被表述为整个出租车驾驶顺序的马尔可夫决策过程。在此模型中设置的状态定义为出租车位置和运行状态。动作集包括空乘,载客或等候的操作选择以及随后的驾驶行为。奖励作为评估驾驶策略的目标函数,定义为衡量一个工作日内出租车司机总利润的有效驾驶比率。 Q学习算法可在任何位置为出租车司机提供最佳选择,从而获得最大的累积奖励。利用北京的历史轨迹数据,进行了实验,验证了该方法的准确性和有效性。结果表明,该方法提高了出租车司机的利润和效率,并增加了乘客找到出租车的机会。通过将奖励函数替换为其他标准,该方法还可用于发现和研究新颖的空间模式。这个新模型是无先验知识的,并且是全局最优的,与以前的方法相比具有优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号