首页> 外文会议>IEEE International Conference on Automation Science and Engineering >Constructive Policy: Reinforcement Learning Approach for Connected Multi-Agent Systems
【24h】

Constructive Policy: Reinforcement Learning Approach for Connected Multi-Agent Systems

机译:建设性政策:互联多智能体系统的强化学习方法

获取原文

摘要

Policy based reinforcement learning methods are widely used for multi-agent systems to learn optimal actions given any state; with partial or even no model representation. However multi-agent systems with complex structures (curse of dimensionality) or with high constraints (like bio-inspired (a) snake or serpentine robots) show limited performance in such environments due to sparse-reward nature of environment and no fully observable model representation. In this paper we present a constructive learning and planning scheme that reduces the complexity of high-diemensional agent model by decomposing it into identical, connected and scaled down multiagent structure and then apply learning framework in layers of local and global ranking. Our layered hierarchy method also decomposes the final goal into multiple sub-tasks and a global task (final goal) that is bias-induced function of local sub-tasks. Local layer deals with learning `reusable' local policy for a local agent to achieve a sub-task optimally; that local policy can also be reused by other identical local agents. Furthermore, global layer learns a policy to apply right combination of local policies that are parameterized over entire connected structure of local agents to achieve the global task by collaborative construction of local agents. After learning local policies and while learning global policy, the framework generates sub-tasks for each local agent, and accepts local agents' intrinsic rewards as positive bias towards maximum global reward based of optimal sub-tasks assignments. The advantage of proposed approach includes better exploration due to decomposition of dimensions, and reusability of learning paradigm over extended dimension spaces. We apply the constructive policy method to serpentine robot with hyper-redundant degrees of freedom (DOF), for achieving optimal control and we also outline connection to hierarchical apprenticeship learning methods which can be seen as layered learning framework for complex control tasks.
机译:基于策略的强化学习方法被广泛用于多主体系统,以在给定任何状态的情况下学习最佳动作。部分或什至没有模型表示。但是,由于环境的稀疏奖励性质,并且没有完全可观察到的模型表示形式,具有复杂结构(维数诅咒)或具有较高约束条件(例如受生物启发的(a)蛇或蛇形机器人)的多主体系统在此类环境中的性能有限。 。在本文中,我们提出了一个建设性的学习和计划方案,该方案通过将高维智能体模型分解为相同的,相互连接的和按比例缩小的多智能体结构,然后将学习框架应用于本地和全球排名层次,从而降低了其复杂性。我们的分层方法也将最终目标分解为多个子任务和一个全局任务(最终目标),该全局任务是局部子任务的偏见引起的功能。本地层处理学习“可重用”的本地策略,以使本地代理最佳地完成子任务。本地策略也可以被其他相同的本地代理重用。此外,全局层学习一种策略,以应用在本地代理的整个连接结构上参数化的本地策略的正确组合,以通过协作构建本地代理来实现全局任务。在学习了本地策略并学习了全局策略之后,该框架会为每个本地代理生成子任务,并接受本地代理的内在奖励,作为基于最佳子任务分配的最大全局奖励的积极偏见。提议的方法的优点包括由于维分解而导致的更好的探索,以及在扩展维空间上学习范式的可重用性。我们将构造策略方法应用于具有超冗余自由度(DOF)的蛇形机器人,以实现最佳控制,并且还概述了与分层学徒学习方法的联系,该方法可以看作是复杂控制任务的分层学习框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号