首页> 外文会议>IEEE Symposium Series on Computational Intelligence >The True Online Continuous Learning Automation (TOCLA) in a continuous control benchmarking of actor-critic algorithms
【24h】

The True Online Continuous Learning Automation (TOCLA) in a continuous control benchmarking of actor-critic algorithms

机译:演员 - 评论家算法的连续控制基准中的真实在线连续学习自动化(TOCLA)

获取原文

摘要

Reinforcement learning problems are often discretised, use linear function approximation, or perform batch updates. However, many applications that can benefit from reinforcement learning contain continuous variables and are inherently non-linear, for example, the control of aerospace or maritime robotic vehicles. Recent work has brought focus onto online temporal difference methods, specifically for using non-linear function approximation. In this paper, we evaluate the Forward Actor-Critic against the regular Actor-Critic, and Continuous Actor-Critic Learning Automation. We also propose and evaluate a new algorithm called True Online Continuous Learning Automation (TOCLA) which combines these two approaches. The chosen benchmark problem was the Mountain Car Continuous-vO environment from OpenAI Gym, which represents a further step in complexity over the benchmark used to test the Forward Actor Critic in previous works. Our results demonstrate the superiority of TOCLA in terms of its sensitivity to hyper-parameter selection compared with the Forward Actor Critic, Continuous Actor Critic Learning Automation, and Actor Critic algorithms.
机译:钢筋学习问题通常是离散的,使用线性函数近似或执行批处理更新。然而,许多可以从强化学习中受益的应用包含连续变量,并且固有地是非线性的,例如,航空航天或海上机器人车辆的控制。最近的工作已经将重点放在网上时间差异方法上,专门用于使用非线性函数近似。在本文中,我们评估了常规演员评论家的前向演员 - 评论家,以及持续的演员 - 评论家学习自动化。我们还提出并评估了一种新的算法,称为真正的在线连续学习自动化(TOCLA),它结合了这两种方法。所选的基准问题是来自Openai健身房的山地汽车连续VO环境,这代表了在以前的作品中测试前向演员批评者的基准中的复杂性进一步迈出了复杂性。我们的结果表明了与前向演员评论家,连续演员评论家学习自动化和演员批评算法相比其对超参数选择的敏感性的敏感性的优越性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号