首页> 外文会议>International workshop on multi-agent-based simulation >Emergent Collective Behaviors in a Multi-agent Reinforcement Learning Pedestrian Simulation: A Case Study
【24h】

Emergent Collective Behaviors in a Multi-agent Reinforcement Learning Pedestrian Simulation: A Case Study

机译:多主体强化学习行人模拟中的紧急集体行为:一个案例研究

获取原文

摘要

In this work, a Multi-agent Reinforcement Learning framework is used to generate simulations of virtual pedestrians groups. The aim is to study the influence of two different learning approaches in the quality of generated simulations. The case of study consists on the simulation of the crossing of two groups of embodied virtual agents inside a narrow corridor. This scenario is a classic experiment inside the pedestrian modeling area, because a collective behavior, specifically the lanes formation, emerges with real pedestrians. The paper studies the influence of different learning algorithms, function approximation approaches, and knowledge transfer mechanisms on performance of learned pedestrian behaviors. Specifically, two different RL-based schemas are analyzed. The first one, Iterative Vector Quantization with Q-Learning (ITVQQL), improves iteratively a state-space generalizer based on vector quantizar tion. The second scheme, named TS, uses tile coding as the generalization method with the Sarsa(A) algorithm. Knowledge transfer approach is based on the use of Probabilistic Policy Reuse to incorporate previously acquired knowledge in current learning processes; additionally, value function transfer is also used in the ITVQQL schema to transfer the value function between consecutive iterations. Results demonstrate empirically that our RL framework generates individual behaviors capable of emerging the expected collective behavior as occurred in real pedestrians. This collective behavior appears independently of the learning algorithm and the generalization method used, but depends extremely on whether knowledge transfer was applied or not. In addition, the use of transfer techniques has a remarkable influence in the final performance (measured in number of times that the task was solved) of the learned behaviors.
机译:在这项工作中,多主体强化学习框架用于生成虚拟行人组的模拟。目的是研究两种不同学习方法对生成的仿真质量的影响。研究案例包括模拟狭窄走廊内两组虚拟虚拟主体的交叉。此场景是行人建模区域内的经典实验,因为真实的行人出现了集体行为,尤其是车道的形成。本文研究了不同的学习算法,函数逼近方法和知识转移机制对学习的行人行为的影响。具体来说,分析了两个不同的基于RL的架构。第一个是带有Q学习的迭代矢量量化(ITVQQL),它基于矢量量化迭代地改进了状态空间泛化器。第二种名为TS的方案将瓦片编码用作Sarsa(A)算法的泛化方法。知识转移方法是基于使用概率策略重用将先前获得的知识纳入当前的学习过程中的;此外,ITVQQL架构中还使用了值函数传递来在连续迭代之间传递值函数。结果从经验上证明,我们的RL框架生成的个人行为能够出现实际行人中出现的预期集体行为。这种集体行为独立于所使用的学习算法和泛化方法而出现,但在很大程度上取决于是否应用了知识转移。此外,传输技术的使用对所学行为的最终性能(以解决任务的次数衡量)具有显着影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号