首页> 外文会议>International Joint Conference on Neural Networks >A multiagent reinforcement learning approach to en-route trip building
【24h】

A multiagent reinforcement learning approach to en-route trip building

机译:一种多级强化学习方法来途径旅行建设

获取原文

摘要

An important stage in traffic planning is traffic assignment, which seeks to reproduce the way drivers select their routes. It assumes that each driver is aware of a number of routes to travel from an origin to a destination, that it performs some experimentation, and that it selects rationally the route with the highest utility. This is the basis for many approaches that, in an iterative way, vary the combination of route choices in order to find one that maximizes the utility. This perspective is therefore a centralized, aggregate one. In reality, though, drivers may perform en-route experimentation, i.e., they deviate from the originally planned route. Thus, in this paper, individual drivers are considered as active and autonomous agents, which, instead of having a central entity assigning complete trips to each agent, build these trips by experimentation during the actual trip. Agents learn their routes by deciding, at each node, how to continue their trips to each one's destination, in a way to minimize their travel times. Because the choice of one agent does impact several others, this is a non-cooperative multiagent learning problem (thus stochastic), which is known for being much more challenging than single agent reinforcement learning. To illustrate this approach, results from two non-trivial networks are presented, which have thousands of learning agents, clearly configuring a hard learning problem. Results are compared to iterative, centralized methods. It is concluded that an agent-based perspective yields choices that are more aligned with the real-world situation because (i) trips are computed by the agent itself (and not provided to the agent by any central entity), and (ii) it is not based on pre-computed paths (rather, it is built during the trip itself).
机译:交通规划中的一个重要阶段是流量分配,它试图重现驱动程序选择其路由的方式。它假设每个驱动程序都知道从原点到目标的许多路由,它执行一些实验,并且它选择具有最高实用程序的路由。这是许多方法的基础,以迭代方式改变路由选择的组合,以便找到最大化实用程序的一个。因此,这种观点是一个集中的聚集体。然而,实际上,司机可以执行途径实验,即,它们偏离最初计划的路线。因此,在本文中,各个驱动程序被视为有源和自主代理,而不是将核心实体分配给每个代理的完整旅行,而是通过实际旅行期间通过实验构建这些旅行。代理通过在每个节点上决定如何继续他们的目的地,以便最大限度地减少他们的旅行时间来学习他们的路线。因为一个代理的选择确实会影响其他人,所以这是一个非合作的多验学习问题(因此随机),这已知比单个代理增强学习更具挑战性。为了说明这种方法,提出了两个非琐碎网络的结果,其中有数千个学习代理,清楚地配置了硬学习问题。结果与迭代,集中式方法进行比较。结论是,基于代理的透视图产生了与现实世界形势更调整的选择,因为(i)由代理本身计算(并且未被任何中央实体提供给代理),并且(ii)不是基于预先计算的路径(而是,它在旅行本身期间构建)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号