首页> 外文会议>International Joint Conference on Neural Networks >A multiagent reinforcement learning approach to en-route trip building
【24h】

A multiagent reinforcement learning approach to en-route trip building

机译:沿途旅行构建的多主体强化学习方法

获取原文

摘要

An important stage in traffic planning is traffic assignment, which seeks to reproduce the way drivers select their routes. It assumes that each driver is aware of a number of routes to travel from an origin to a destination, that it performs some experimentation, and that it selects rationally the route with the highest utility. This is the basis for many approaches that, in an iterative way, vary the combination of route choices in order to find one that maximizes the utility. This perspective is therefore a centralized, aggregate one. In reality, though, drivers may perform en-route experimentation, i.e., they deviate from the originally planned route. Thus, in this paper, individual drivers are considered as active and autonomous agents, which, instead of having a central entity assigning complete trips to each agent, build these trips by experimentation during the actual trip. Agents learn their routes by deciding, at each node, how to continue their trips to each one's destination, in a way to minimize their travel times. Because the choice of one agent does impact several others, this is a non-cooperative multiagent learning problem (thus stochastic), which is known for being much more challenging than single agent reinforcement learning. To illustrate this approach, results from two non-trivial networks are presented, which have thousands of learning agents, clearly configuring a hard learning problem. Results are compared to iterative, centralized methods. It is concluded that an agent-based perspective yields choices that are more aligned with the real-world situation because (i) trips are computed by the agent itself (and not provided to the agent by any central entity), and (ii) it is not based on pre-computed paths (rather, it is built during the trip itself).
机译:交通规划中的一个重要阶段是交通分配,它旨在重现驾驶员选择路线的方式。假定每个驾驶员都知道从起点到目的地的许多路线,并进行了一些实验,并合理地选择了效用最高的路线。这是许多方法的基础,这些方法以迭代的方式更改路线选择的组合,以找到使效用最大化的方法。因此,这种观点是集中的,汇总的观点。不过,实际上,驾驶员可能会进行路途实验,即他们偏离了最初计划的路线。因此,在本文中,单个驾驶员被认为是主动和自治的代理人,而不是让中央实体为每个代理人分配完整的旅行,而是通过在实际旅行中进行试验来构建这些旅行。代理通过在每个节点上决定如何继续前往每个目的地的行程来学习他们的路线,从而最大限度地减少了行进时间。因为选择一个代理确实会影响其他几个代理,所以这是一个非合作的多代理学习问题(因此是随机的),众所周知,它比单代理增强学习更具挑战性。为了说明这种方法,给出了来自两个非平凡网络的结果,该网络具有成千上万的学习代理,清楚地配置了艰苦的学习问题。将结果与迭代的集中式方法进行比较。结论是,基于代理的观点所产生的选择与实际情况更为一致,因为(i)行程是由代理本身计算的(而不是由任何中央实体提供给代理的),并且(ii)不是基于预先计算的路径(而是在行程本身中构建的)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号