...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations
【24h】

Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations

机译:具有次优示范的竞争力的多代理反增强学习

获取原文
           

摘要

This paper considers the problem of inverse reinforcement learning in zero-sum stochastic games when expert demonstrations are known to be suboptimal. Compared to previous works that decouple agents in the game by assuming optimality in expert policies, we introduce a new objective function that directly pits experts against Nash Equilibrium policies, and we design an algorithm to solve for the reward function in the context of inverse reinforcement learning with deep neural networks as model approximations. To ?nd Nash Equilibrium in large-scale games, we also propose an adversarial training algorithm for zero-sum stochastic games, and show the theoretical appeal of non-existence of local optima in its objective function. In numerical experiments, we demonstrate that our Nash Equilibrium and inverse reinforcement learning algorithms address games that are not amenable to existing benchmark algorithms. Moreover, our algorithm successfully recovers reward and policy functions regardless of the quality of the sub-optimal expert demonstration set.
机译:本文认为,当已知专家演示是次优时,零汇率随机游戏中的逆钢筋学习问题。与以前的作品相比,通过假设专家政策的最优性解散游戏中的代理,我们介绍了一种新的目标函数,直接追捕纳什均衡政策的专家,我们设计了一种在反钢化学习背景下解决奖励功能的算法深神经网络作为模型近似。在大型游戏中的ND纳什均衡,我们还提出了一种零汇率随机游戏的对抗训练算法,并在其客观函数中显示了本地最优的理论吸引力。在数值实验中,我们证明我们的纳什均衡和逆钢筋学习算法,这些算法不适合现有的基准算法。此外,我们的算法成功地恢复了奖励和策略功能,而不管次优专家演示集的质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号