Can Sophisticated Dispatching Strategy Acquired by Reinforcement Learning?: A Case Study in Dynamic Courier Dispatching System

机译：通过强化学习可以获得复杂的调度策略吗？：动态快递调度系统实例研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we study a courier dispatching problem (CDP) raised from an online pickup-service platform of Alibaba. The CDP aims to assign a set of couriers to serve pickup requests with stochastic spatial and temporal arrival rate among urban regions. The objective is to maximize the revenue of served requests given a limited number of couriers over a period of time. Many online algorithms such as dynamic matching and vehicle routing strategy from existing literature could be applied to tackle this problem. However, these methods rely on appropriately predefined optimization objectives at each decision point, which is hard in dynamic situations. This paper formulates the CDP as a Markov decision process (MDP) and proposes a data-driven approach to derive the optimal dispatching rule-set under different scenarios. Our method stacks multi-layer images of the spatial-and-temporal map and apply multi-agent reinforcement learning (MARL) techniques to evolve dispatching models. This method solves the learning inefficiency caused by traditional centralized MDP modeling. Through comprehensive experiments on both artificial dataset and real-world dataset, we show: 1) By utilizing historical data and considering long-term revenue gains, MARL achieves better performance than myopic online algorithms; 2) MARL is able to construct the mapping between complex scenarios to sophisticated decisions such as the dispatching rule. 3) MARL has the scalability to adopt in large-scale real-world scenarios.

机译：本文研究了阿里巴巴在线取货服务平台提出的快递配送问题。CDP旨在分配一组信使，以随机的空间和时间到达率在城市地区之间提供接送请求。目标是在一段时间内，在有限数量的快递员的情况下，最大限度地提高已送达请求的收入。现有文献中的许多在线算法，如动态匹配和车辆路径策略，都可以用来解决这个问题。然而，这些方法依赖于每个决策点上适当预定义的优化目标，这在动态情况下很难实现。本文将CDP描述为马尔可夫决策过程（MDP），并提出了一种数据驱动的方法来推导不同场景下的最优调度规则集。我们的方法将时空地图的多层图像堆叠起来，并应用多智能体强化学习（MARL）技术来进化调度模型。该方法解决了传统集中式MDP建模带来的学习效率低下的问题。通过对人工数据集和真实数据集的综合实验，我们发现：1）通过利用历史数据并考虑长期收益收益，MARL算法比短视的在线算法取得了更好的性能；2） MARL能够构建复杂场景到复杂决策（如调度规则）之间的映射。3） MARL具有在大规模现实场景中采用的可伸缩性。

著录项

来源
《International Conference on Autonomous Agents and Multiagent Systems》|2019年|1216-1846p|共9页
会议地点
作者
Yujie Chen; Zili Wu; Yu Qian; Rongqi Li; Yichen Yao; Yinzhi Zhou; Haoyuan Hu; Yinghui Xu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Multi-agent reinforcement learning; Courier dispatching problem; Smart cities;

机译：多智能体强化学习;快递配送问题;智慧城市;

相似文献

外文文献
中文文献
专利

1. Nash-Q learning-based collaborative dispatch strategy for interconnected power systems [J] . Ran Li, Yi Han, Tao Ma, 全球能源互联网（英文版） . 2020,第003期
2. Nash-Q learning-based collaborative dispatch strategy for interconnected power systems [J] . Ran Li, Yi Han, Tao Ma, 全球能源互联网：英文版 . 2020,第003期
3. A self-learning TLBO based dynamic economic/environmental dispatch considering multiple plug-in electric vehicle loads [J] . Zhile YANG, Kang LI, Qun NIU, 现代电力系统与清洁能源学报(英文) . 2014,第004期
4. Parallel Dispatch: A New Paradigm of Electrical Power System Dispatch [J] . Jun Jason Zhang, Fei-Yue Wang, Qiang Wang, 自动化学报：英文版 . 2018,第001期
5. Dynamic energy dispatch strategy for integrated energy system based on improved deep reinforcement learning [J] . Yang Ting, Zhao Liyuan, Li Wei, Energy . 2021,第Nova15期

机译：基于改进的深度增强学习的综合能源系统动态能源调度策略
6. An advanced real-time dispatching strategy for a distributed energy system based on the reinforcement learning algorithm [J] . Meng Fanyi, Bai Yang, Jin Jingliang Renewable energy . 2021,第Nova期

机译：基于钢筋学习算法的分布式能源系统的高级实时调度策略
7. The Concept of Constructing an Artificial Dispatcher Intelligent System Based on Deep Reinforcement Learning for the Automatic Control System of Electric Networks [J] . N. V. Tomin Journal of Computer and Systems Sciences International . 2020,第6期

机译：基于深增强学习的电网自动控制系统构建人工调度智能系统的概念
8. A STUDY OF REINFORCEMENT LEARNING APPLIED TO DYNAMIC SINGLE-MACHINE JOB DISPATCHING [C] . Yi-Chi Wang, John M. Usher International Conference on Engineering Design and Automation . 2002

机译：加固学习应用于动态单机职位调度的研究
9. A study of interconnected dynamical systems and reinforcement learning in a multi-agent and distributed environment. [D] . Madera, Manuel. 2012

机译：在多主体和分布式环境中研究相互联系的动力系统和强化学习。
10. Accuracy of emergency medical dispatchers subjective ability to identify when higher dispatch levels are warranted over a Medical Priority Dispatch System automated protocols recommended coding based on paramedic outcome data [O] . Jeff Clawson, Christopher H O Olola, Andy Heward, 2007

机译：紧急医疗调度员主观能力的确定能力该时间基于医疗人员派遣数据根据医疗优先调度系统自动协议的推荐编码确定何时需要保证更高的调度水平
11. Dynamic scheduling for discrete production systems by multi objective dispatching rule synthesis based on data envelopment analysis and reinforcement learning [O] . Chen Xili 2011

机译：基于数据包络分析和强化学习的多目标调度规则综合动态离散生产系统调度
12. Power Plant Dispatch Study: An Analysis of the PIES Model of Electric Utility Dispatch [R] . Sustman, J. E. 1979

机译：电厂调度研究：电力调度pIEs模型分析

Can Sophisticated Dispatching Strategy Acquired by Reinforcement Learning?: A Case Study in Dynamic Courier Dispatching System

摘要

著录项

相似文献

相关主题

期刊订阅