An exploratory rollout policy for imagination-augmented agents

Liu Peng; Zhao Yingnan; Zhao Wei; Tang Xianglong; Yang Zichan

首页> 外文期刊>Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies >An exploratory rollout policy for imagination-augmented agents

【24h】

An exploratory rollout policy for imagination-augmented agents

机译：用于想象力增强代理商的探索性推广政策

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Typical reinforcement learning methods usually lack planning and thus require large amounts of training data to achieve the expected performance. Imagination-Augmented Agents(I2A) based on a model-based method learns to extract information from the imagined trajectories to construct implicit plans and show improved data efficiency and performance. However, in I2A, these imagined trajectories are generated by a shared rollout policy, which makes these trajectories look similar and contain little information. We propose an exploratory rollout policy named E-I2A. When the agent's performance is poor, E-I2A produces diversity in the imagined trajectories that are more informative. When the agent's performance is improved with training, the trajectories generated by E-I2A are consistent with agent trajectories in the real environment and produce high rewards. To achieve this, first we formulate the novelty of one state through training an inverse dynamic model and then the agent picks the states with the highest novelty to generate diverse trajectories. Simultaneously, we train a distilled value function model to estimate the expected return of one state. By doing this, we can imagine the state with the highest return that makes the imagined trajectories consistent with the real trajectories. Finally, we propose an adaptive method to improve the agent's performance that produces consistent imagined trajectories that were originally very diverse. Our method demonstrates improved performance and data efficiency through offering more information when making decisions. We evaluated E-I2A on several challenging domains including Minipacman and Sokoban; E-I2A can outperform several baselines.

机译：典型的钢筋学习方法通常缺乏规划，因此需要大量的培训数据来实现预期的性能。基于基于模型的方法的想象力增强代理（I2A）了解从想象的轨迹中提取信息以构建隐式计划并显示改进的数据效率和性能。然而，在I2A中，这些想象的轨迹由共享的推出策略产生，这使得这些轨迹看起来类似并且包含很少的信息。我们提出了一个名为E-I2A的探索性推广策略。当代理人的性能较差时，E-I2A在更具信息丰富的想象轨迹中产生多样性。当代理的性能随着训练得到改善时，E-I2A产生的轨迹与真实环境中的代理轨迹一致，并产生高奖励。为实现这一目标，首先我们通过培训逆动态模型制定一个州的新颖性，然后代理商用最高的新颖性来挑选各种轨迹。同时，我们训练蒸馏价值函数模型来估计一个州的预期返回。通过这样做，我们可以想象具有最高回报的状态，使得想象的轨迹与真实轨迹一致。最后，我们提出了一种自适应方法来提高代理商的性能，这些性能产生了最初是非常多样化的始终如一的想象的轨迹。我们的方法通过在做出决定时提供更多信息来展示改进的性能和数据效率。我们在几个具有挑战性的领域中评估了E-I2A，包括小木屋和索科坎; e-i2a可以优于几个基线。

著录项

来源
《Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies》 |2019年第10期|共16页
作者
Liu Peng; Zhao Yingnan; Zhao Wei; Tang Xianglong; Yang Zichan;
展开▼
作者单位

Harbin Inst Technol Sch Comp Sci &

Technol Harbin 150001 Heilongjiang Peoples R China;

Harbin Inst Technol Sch Comp Sci &

Technol Harbin 150001 Heilongjiang Peoples R China;

Harbin Inst Technol Sch Comp Sci &

Technol Harbin 150001 Heilongjiang Peoples R China;

Harbin Inst Technol Sch Comp Sci &

Technol Harbin 150001 Heilongjiang Peoples R China;

Harbin Normal Univ Sch Comp Sci &

Informat Engn Harbin Heilongjiang Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
Model-based reinforcement learning; Implicit plan; Imagination-Augmented Agents; Exploratory rollout policy;

机译：基于模型的强化学习;隐含计划;想象力增强代理商;探索性推广政策;

相似文献

外文文献
中文文献
专利

1. An exploratory rollout policy for imagination-augmented agents [J] . Liu Peng, Zhao Yingnan, Zhao Wei, Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2019,第10期

机译：用于想象力增强代理商的探索性推广政策
2. Multiagent Reinforcement Learning:Rollout and Policy Iteration [J] . Dimitri Bertsekas 自动化学报：英文版 . 2021,第002期

机译：多元强化学习：推出和政策迭代
3. Multiagent Reinforcement Learning:Rollout and Policy Iteration [J] . Dimitri Bertsekas 自动化学报（英文版） . 2021,第002期

机译：多元强化学习：推出和政策迭代
4. Nested Rollout Policy Adaptation for Multiagent System Optimization in Manufacturing [C] . Stefan Edelkamp, Christoph Greulich International Conference on Agents and Artificial Intelligence . 2017

机译：嵌套的卷展策略适应制造中的多层系统优化
5. Multiagent Monte Carlo Tree Search with Difference Evaluations and Evolved Rollout Policy [D] . Zerbel, Nicholas. 2018

机译：具有差异评估和演进式推广策略的多主体蒙特卡洛树搜索
6. Combining exploratory scenarios and participatory backcasting: using an agent-based model in participatory policy design for a multi-functional landscape [O] . Derek B. Van Berkel, Peter H. Verburg -1

机译：结合探索性方案和参与式回播：在多用途环境的参与式策略设计中使用基于代理的模型
7. Combining exploratory scenarios and participatory backcasting: using an agent-based model in participatory policy design for a multi-functional landscape [O] . 2012

机译：结合探索性方案和参与式回播：在多用途环境的参与式策略设计中使用基于代理的模型

An exploratory rollout policy for imagination-augmented agents

摘要

著录项

相似文献

相关主题

期刊订阅