首页> 外文会议>IEEE International Conference on Automation Science and Engineering >Constructive Policy: Reinforcement Learning Approach for Connected Multi-Agent Systems
【24h】

Constructive Policy: Reinforcement Learning Approach for Connected Multi-Agent Systems

机译:建设性政策:连接多智能体系的加固学习方法

获取原文

摘要

Policy based reinforcement learning methods are widely used for multi-agent systems to learn optimal actions given any state; with partial or even no model representation. However multi-agent systems with complex structures (curse of dimensionality) or with high constraints (like bio-inspired (a) snake or serpentine robots) show limited performance in such environments due to sparse-reward nature of environment and no fully observable model representation. In this paper we present a constructive learning and planning scheme that reduces the complexity of high-diemensional agent model by decomposing it into identical, connected and scaled down multiagent structure and then apply learning framework in layers of local and global ranking. Our layered hierarchy method also decomposes the final goal into multiple sub-tasks and a global task (final goal) that is bias-induced function of local sub-tasks. Local layer deals with learning `reusable' local policy for a local agent to achieve a sub-task optimally; that local policy can also be reused by other identical local agents. Furthermore, global layer learns a policy to apply right combination of local policies that are parameterized over entire connected structure of local agents to achieve the global task by collaborative construction of local agents. After learning local policies and while learning global policy, the framework generates sub-tasks for each local agent, and accepts local agents' intrinsic rewards as positive bias towards maximum global reward based of optimal sub-tasks assignments. The advantage of proposed approach includes better exploration due to decomposition of dimensions, and reusability of learning paradigm over extended dimension spaces. We apply the constructive policy method to serpentine robot with hyper-redundant degrees of freedom (DOF), for achieving optimal control and we also outline connection to hierarchical apprenticeship learning methods which can be seen as layered learning framework for complex control tasks.
机译:基于策略的强化学习方法被广泛用于多代理系统学习最佳的行动给出任何状态;具有部分或甚至不模型表示。但是具有复杂结构(维数灾)或具有高的约束(如生物启发的(a)蛇或蛇形机器人)多代理系统表明在这样的环境中,由于环境的稀疏奖励性质和没有充分观察到的模型表示性能有限。在本文中,我们提出,通过分解成相同的,连接的和按比例缩小的多代理结构,然后应用在局部和全局排名层学习框架降低了的高diemensional代理模型复杂度的建设性的学习和规划方案。我们的分层结构方法也分解的最终目标分成多个子任务和全局任务(最终目标),它属于本地子任务偏见引起的功能。一起学习`可重复使用的”本地策略的本地代理最佳地体现一个子任务本地层处理;地方政策也可以通过其它相同的当地代理商重用。此外,全球层得知申请被参数化在当地代理商的整个连接结构,实现由当地代理商合作建设全球任务地方政策的正确组合的策略。学习地方政策后,虽然学的全球政策,框架生成的子任务为每个本地代理,并接受当地代理商的内在报酬是朝着基于最佳的子任务分配的最大的全球奖励正偏差。该方法的优点包括由于尺寸的分解更好地探索和学习的范式在伸长尺寸空间的可重用性。我们采用建设性的策略方法,以蛇形机器人(DOF)超冗余自由度,为实现最优控制,我们还概述了连接,可以被看作是复杂的控制任务的分层学习框架层次的学徒学习方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号