MA-TREX: Mutli-agent Trajectory-Ranked Reward Extrapolation via Inverse Reinforcement Learning

机译：MA-T REX：通过逆向强化学习进行多主体轨迹排序的奖励外推

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Trajectory-ranked reward extrapolation (T-REX) provides a general framework to infer users' intentions from sub-optimal demonstrations. However, it becomes inflexible when encountering multi-agent scenarios, due to its high complexity caused by rational behaviors, e.g., cooperation and communication. In this paper, we propose a novel Multi-Agent Trajectory-ranked Reward Extrapolation framework (MA-TREX), which adopts inverse reinforcement learning to infer demonstrators' cooperative intention in the environment with high-dimensional state-action space. Specifically, to reduce the dependence on demonstrators, the MA-TREX uses self-generated demonstrations to iteratively extrapolate the reward function. Moreover, a knowledge transfer method is adopted in the iteration process, by which the self-generated data required subsequently is only one third of the initial demonstrations. Experimental results on several multi-agent collaborative tasks demonstrate that the MA-TREX can effectively surpass the demonstrators and obtain the same level reward as the ground truth quickly and stably.

机译：轨迹排序的奖励外推（T-REX）提供了一个总体框架，可以从次优演示中推断用户的意图。但是，由于遇到诸如协作和交流之类的理性行为而导致的高复杂性，当遇到多主体场景时，它变得不灵活。在本文中，我们提出了一种新颖的多主体轨迹排序的奖励外推框架（MA-TREX），该框架采用逆向强化学习来推断高维状态作用空间环境中示威者的合作意图。具体来说，为了减少对示威者的依赖，MA-TREX使用自生的示威来迭代推断奖励函数。此外，在迭代过程中采用了一种知识转移方法，该方法随后需要的自生成数据仅是初始演示的三分之一。在多个多主体协作任务上的实验结果表明，MA-TREX可以快速，稳定地超越演示者，并获得与地面真实情况相同的奖励。

著录项

来源
《International conference on knowledge science, engineering and management》|2020年|3-14|共12页
会议地点
作者
Sili Huang; Bo Yang; Hechang Chen; Haiyin Piao; Zhixiao Sun; Yi Chang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Mutli-agent system; Inverse reinforcement learning; Reward extrapolation; Iterative extrapolation; Knowledge transfer;

机译：多代理系统;逆向强化学习;奖励外推;迭代外推;知识传输;
入库时间 2022-08-26 13:54:46

相似文献

外文文献
中文文献
专利

1. SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards [J] . Krishnan Sanjay, Garg Animesh, Liaw Richard, The International journal of robotics research . 2019,第2a3期

机译：SWIRL：顺序窗口逆强化学习算法，用于延迟奖励的机器人任务
2. An investor sentiment reward-based trading system using Gaussian inverse reinforcement learning algorithm [J] . Yang Steve Y., Yu Yangyang, Almandi Saud Expert Systems with Application . 2018,第DECa期

机译：基于高斯逆强化学习算法的基于投资者情绪回报的交易系统
3. Modified reward function on abstract features in inverse reinforcement learning [J] . Shen-yi?Chen, Hui?Qian, Jia?Fan, Journal of Zhejiang university science . 2010,第9期

机译：逆强化学习中对抽象特征的修正奖励函数
4. Estimation of Reward Function Maximizing Learning Efficiency in Inverse Reinforcement Learning [C] . Yuki Kitazato, Sachiyo Arai International Conference on Agents and Artificial Intelligence . 2018

机译：奖励函数估算逆加强学习中的学习效率
5. Learning Policies for Model-Based Reinforcement Learning Using Distributed Reward Formulation [D] . Agarwal, Nikhil. 2021

机译：使用分布式奖励制定学习基于模型的强化学习的政策
6. Inferring reward prediction errors in patients with schizophrenia: a dynamic reward task for reinforcement learning [O] . Chia-Tzu Li, Wen-Sung Lai, Chih-Min Liu, 2014

机译：推断精神分裂症患者的奖励预测错误：强化学习的动态奖励任务
7. Active Learning for Reward Estimation in Inverse Reinforcement Learning [O] . Manuel Lopes, Francisco Melo, Luis Montesano 2009

机译：主动学习在逆向强化学习中的奖励估算

MA-TREX: Mutli-agent Trajectory-Ranked Reward Extrapolation via Inverse Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅