首页> 外文期刊>Autonomous agents and multi-agent systems >Decomposition methods with deep corrections for reinforcement learning
【24h】

Decomposition methods with deep corrections for reinforcement learning

机译:具有深度校正的分解方法,用于强化学习

获取原文
获取原文并翻译 | 示例

摘要

Decomposition methods have been proposed to approximate solutions to large sequential decision making problems. In contexts where an agent interacts with multiple entities, utility decomposition can be used to separate the global objective into local tasks considering each individual entity independently. An arbitrator is then responsible for combining the individual utilities and selecting an action in real time to solve the global problem. Although these techniques can perform well empirically, they rely on strong assumptions of independence between the local tasks and sacrifice the optimality of the global solution. This paper proposes an approach that improves upon such approximate solutions by learning a correction term represented by a neural network. We demonstrate this approach on a fisheries management problem where multiple boats must coordinate to maximize their catch over time as well as on a pedestrian avoidance problem for autonomous driving. In each problem, decomposition methods can scale to multiple boats or pedestrians by using strategies involving one entity. We verify empirically that the proposed correction method significantly improves the decomposition method and outperforms a policy trained on the full scale problem without utility decomposition.
机译:已经提出了分解方法来近似解决大型顺序决策问题的方法。在代理与多个实体进行交互的情况下,可以使用效用分解将全局目标划分为独立考虑每个单独实体的局部任务。然后由一名仲裁员负责合并各个实用程序并实时选择一个解决全局问题的措施。尽管这些技术可以凭经验表现良好,但是它们依赖于本地任务之间独立性的强大假设,并牺牲了全局解决方案的最优性。本文提出了一种通过学习由神经网络表示的校正项来改进此类近似解的方法。我们在渔业管理问题上演示了这种方法,在该问题上,多条船必须协调以最大程度地增加其捕获量,并解决自动驾驶的行人回避问题。在每个问题中,分解方法可以通过使用涉及一个实体的策略扩展到多条船或行人。我们从经验上验证了所提出的校正方法显着改进了分解方法,并且胜过了在没有效用分解的情况下针对全面问题训练的策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号