首页> 外文会议>International conference on knowledge science, engineering and management >MA-TREX: Mutli-agent Trajectory-Ranked Reward Extrapolation via Inverse Reinforcement Learning
【24h】

MA-TREX: Mutli-agent Trajectory-Ranked Reward Extrapolation via Inverse Reinforcement Learning

机译:MA-T REX:通过逆向强化学习进行多主体轨迹排序的奖励外推

获取原文

摘要

Trajectory-ranked reward extrapolation (T-REX) provides a general framework to infer users' intentions from sub-optimal demonstrations. However, it becomes inflexible when encountering multi-agent scenarios, due to its high complexity caused by rational behaviors, e.g., cooperation and communication. In this paper, we propose a novel Multi-Agent Trajectory-ranked Reward Extrapolation framework (MA-TREX), which adopts inverse reinforcement learning to infer demonstrators' cooperative intention in the environment with high-dimensional state-action space. Specifically, to reduce the dependence on demonstrators, the MA-TREX uses self-generated demonstrations to iteratively extrapolate the reward function. Moreover, a knowledge transfer method is adopted in the iteration process, by which the self-generated data required subsequently is only one third of the initial demonstrations. Experimental results on several multi-agent collaborative tasks demonstrate that the MA-TREX can effectively surpass the demonstrators and obtain the same level reward as the ground truth quickly and stably.
机译:轨迹排序的奖励外推(T-REX)提供了一个总体框架,可以从次优演示中推断用户的意图。但是,由于遇到诸如协作和交流之类的理性行为而导致的高复杂性,当遇到多主体场景时,它变得不灵活。在本文中,我们提出了一种新颖的多主体轨迹排序的奖励外推框架(MA-TREX),该框架采用逆向强化学习来推断高维状态作用空间环境中示威者的合作意图。具体来说,为了减少对示威者的依赖,MA-TREX使用自生的示威来迭代推断奖励函数。此外,在迭代过程中采用了一种知识转移方法,该方法随后需要的自生成数据仅是初始演示的三分之一。在多个多主体协作任务上的实验结果表明,MA-TREX可以快速,稳定地超越演示者,并获得与地面真实情况相同的奖励。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号