首页> 外文期刊>Engineering Applications of Artificial Intelligence >Accelerating decentralized reinforcement learning of complex individual behaviors
【24h】

Accelerating decentralized reinforcement learning of complex individual behaviors

机译:加速分散式强化学习复杂的个人行为

获取原文
获取原文并翻译 | 示例
           

摘要

Many Reinforcement Learning (RL) real-world applications have multi-dimensional action spaces which suffer from the combinatorial explosion of complexity. Then, it may turn infeasible to implement Centralized RL (CRL) systems due to the exponential increasing of dimensionality in both the state space and the action space, and the large number of training trials. In order to address this, this paper proposes to deal with these issues by using Decentralized Reinforcement Learning (DRL) to alleviate the effects of the curse of dimensionality on the action space, and by transferring knowledge to reduce the training episodes so that asymptotic converge can be achieved. Three DRL schemes are compared: DRL with independent learners and no prior-coordination (DRLInd); DRL accelerated-coordinated by using the Control Sharing (DRL+CoSh) Knowledge Transfer approach; and a proposed DRL scheme using the CoSh-based variant Nearby Action Sharing to include a measure of the uncertainty into the CoSh procedure (DRL+NeASh). These three schemes are analyzed through an extensive experimental study and validated through two complex real-world problems, namely the inwalk-kicking and the ball-dribbling behaviors, both performed with humanoid biped robots. Obtained results show (empirically): (i) the effectiveness of DRL systems which even without prior-coordination are able to achieve asymptotic convergence throughout indirect coordination; (ii) that by using the proposed knowledge transfer methods, it is possible to reduce the training episodes and to coordinate the DRL process; and (iii) obtained learning times are between 36% and 62% faster than the DRL-Ind schemes in the case studies.
机译:现实世界中许多强化学习(RL)应用程序都具有多维动作空间,这些动作空间受复杂性组合爆炸的影响。然后,由于状态空间和动作空间中维数的指数增长以及大量的训练试验,实现集中式RL(CRL)系统可能变得不可行。为了解决这个问题,本文建议通过使用分散强化学习(DRL)来减轻维数诅咒对动作空间的影响,并通过转移知识来减少训练次数,从而使渐近收敛可以解决这些问题。取得成就。比较了三种DRL方案:具有独立学习者且没有事先协调的DRL; DRL通过使用控制共享(DRL + CoSh)知识转移方法来加速协调;以及建议的DRL方案,该方案使用基于CoSh的变体附近动作共享将不确定性的度量纳入CoSh程序(DRL + NeASh)。通过广泛的实验研究对这三种方案进行了分析,并通过两个复杂的现实世界问题进行了验证,即人行脚踏机器人执行的步入踢球和运球行为。获得的结果表明(凭经验):(i)即使没有事先协调,DRL系统的有效性也能够在整个间接协调中实现渐近收敛; (ii)通过使用建议的知识转移方法,可以减少培训次数并协调DRL过程; (iii)在案例研究中,获得的学习时间比DRL-Ind方案快36%至62%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号