首页> 外文期刊>Journal of Advanced Computatioanl Intelligence and Intelligent Informatics >Introduction of Fixed Mode States into Online Reinforcement Learning with Penalties and Rewards and its Application to Biped Robot Waist Trajectory Generation
【24h】

Introduction of Fixed Mode States into Online Reinforcement Learning with Penalties and Rewards and its Application to Biped Robot Waist Trajectory Generation

机译:将固定模式状态引入带有惩罚和奖励的在线强化学习中,并将其应用于两足机器人腰部弹道

获取原文
获取原文并翻译 | 示例
           

摘要

During a long-term reinforcement learning task, the efficiency of learning is heavily degraded because the probabilistic actions of an agent often cause the task to fail, which makes it difficult to reach the goal and receive a reward. To address this problem, a fixed mode state is proposed in this paper. If the agent acquires an adequate reward, a normal state is switched to a fixed mode state. In this mode, the agent selects an action using a greedy strategy, i.e., it selects the highest weight action deterministically. First, this paper combines Online Profit Sharing reinforcement learning with the Penalty Avoiding Rational Policy Making algorithm, then introduces fixed mode states in it. The target task is then formulated, i.e., learning the modified waist trajectory of dynamically stable walking task based on the static stable walking of a biped robot. Finally, we present our simulation results and discuss the effectiveness of the proposed method.
机译:在长期的强化学习任务中,学习的效率会大大降低,因为代理的概率动作通常会导致任务失败,从而难以实现目标并获得回报。为了解决这个问题,本文提出了一种固定模式状态。如果代理获得足够的奖励,则将正常状态切换到固定模式状态。在这种模式下,代理使用贪婪策略选择一个动作,即,它确定性地选择权重最高的动作。首先,本文将在线利润共享强化学习与避免惩罚理性决策算法相结合,然后介绍其中的固定模式状态。然后制定目标任务,即基于两足机器人的静态稳定行走学习动态稳定行走任务的修改后腰部轨迹。最后,我们给出了仿真结果并讨论了所提方法的有效性。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号